The customer of ours recently had the strong performance degradation of their Progress application. The root case turned out to be the disks: time to copy 1 GB file took 5-6 minutes instead of 7 seconds during the normal period. There were no IO waits. CPU was 95% idle. The iostat shown very low disk activity.
From the Progress' point of view there were the following problems:
Time to connect db through shared memory took 20-25 seconds.
We had enough time to generate the protrace file:
(5) 0xe000000130043440 ---- Signal 16 (SIGUSR1) delivered ---- (6) 0xc00000000054cc10 _open_sys + 0x30 [/usr/lib/hpux64/libc.so.1] (7) 0xc000000000561070 _open + 0x170 at ../../../../../core/libs/libc/shared_em_64_perf/../core/syscalls/t_open.c:28 [/usr/lib/hpux64/libc.so.1] (8) 0x40000000008069e0 dbut_utOsOpen + 0x90 at /vobs_rkt/src/dbut/ut/utosopen.c:310 [/usr/dlc/bin/_progres] (9) 0x4000000000716ec0 bkioOpenUNIX + 0xb0 at /vobs_rkt/src/dbmgr/bkio/bkio.c:1005 [/usr/dlc/bin/_progres] (10) 0x400000000071bc70 bkioOpen + 0x210 at /vobs_rkt/src/dbmgr/bkio/bkiotop.c:784 [/usr/dlc/bin/_progres] (11) 0x4000000000727c50 bkOpenOneFile + 0x290 at /vobs_rkt/src/dbmgr/bk/bkopen.c:2052 [/usr/dlc/bin/_progres] (12) 0x4000000000727720 bkOpenAllFiles + 0x1d0 at /vobs_rkt/src/dbmgr/bk/bkopen.c:1942 [/usr/dlc/bin/_progres]
Remote connections were instant.
Promon shown that there were more-or-less normal activity during 5-10 seconds. Then db hung approximately for 20 seconds. During this time one of the processes (the different ones) with the active transactions hold and did not release MTX latch. The same process during the same period hold and did not release a buffer lock (EXCL). There are no any signs that a process that hold MTX latch was waiting for any other db resource. Processes had created the small transactions. It looked like db was "proquiet'ed" except the facts that during proquiet it's a db broker that holds the MTX latch and the broker also holds BIB latch. In our case BIB was not locked. During the pauses there were neither db writes nor db reads. There were only the record reads - obviously the client's sessions were able to read the records that were already located in buffer pool.
The example that combines a few promon screens:
Activity: Latch Counts
Status: Active Transactions
Status: Buffer Lock Queue
Latch Counts ----- Locks ----- Naps Active Transactions Status: Buffer Lock Queue Owner Total /Sec /Sec Trans id Trans State DBKEY Area T Status Type Usect 18:25:41 MTX -- 5593 2796 9 18:25:45 MTX 101 0 0 0 1516710131 Active 36678784 17 D LOCKED EXCL 1 18:25:48 MTX 101 0 0 0 1516710131 Active 36678784 17 D LOCKED EXCL 1 18:25:52 MTX 101 0 0 0 1516710131 Active 36678784 17 D LOCKED EXCL 1 18:25:55 MTX 101 0 0 0 1516710131 Active 36678784 17 D LOCKED EXCL 1 18:25:59 MTX 101 0 0 0 1516710131 Active 36678784 17 D LOCKED EXCL 1 18:26:02 MTX 101 0 0 0 1516710131 Active 36678784 17 D LOCKED EXCL 1 18:26:06 MTX 0 2173 1086 119 18:26:09 MTX 0 0 0 0 18:26:13 MTX 0 0 0 0 18:26:16 MTX 0 0 0 0 18:26:20 MTX 0 0 0 0 18:26:23 MTX 0 0 0 0 18:26:27 MTX 0 0 0 0 18:26:31 MTX 135 5445 2722 12 1516712589 Active 36680832 17 D LOCKED EXCL 1 18:26:35 MTX 135 3469 1734 51 1516713316 Active 18:26:38 MTX 135 0 0 0 1516713316 Active 18:26:42 MTX 135 0 0 0 1516713316 Active 18:26:45 MTX 135 0 0 0 1516713316 Active 18:26:49 MTX 135 0 0 0 1516713316 Active 18:26:52 MTX 135 0 0 0 1516713316 Active 18:26:56 MTX 135 0 0 0 1516713316 Active 18:26:59 MTX 135 5415 2707 145 1516714016 Active 36674496 17 D LOCKED EXCL 1 18:27:03 MTX 135 0 0 0 1516714016 Active 36674496 17 D LOCKED EXCL 1 18:27:06 MTX 135 0 0 0 1516714016 Active 36674496 17 D LOCKED EXCL 1 18:27:10 MTX 135 0 0 0 1516714016 Active 36674496 17 D LOCKED EXCL 1 18:27:13 MTX 135 0 0 0 1516714016 Active 36674496 17 D LOCKED EXCL 1 18:27:17 MTX 135 0 0 0 1516714016 Active 36674496 17 D LOCKED EXCL 1 18:27:20 MTX 0 1167 583 51 18:27:24 MTX 0 0 0 0
Usr 101 was a batch job.
Usr 135 was an interactive user.
The pauses in db activity are not related to the checkpoints.
But during the checkpoints we had the same huge pauses:
No. Time Len Freq Dirty CPT Q Scan APW Q Flushes Duration Sync Time 71 17:54:56 144 170 9508 9507 0 0 0 22.62 0.00 72 17:57:46 111 111 10798 7682 0 0 3115 2.50 0.00 73 17:59:37 244 244 12242 10430 49 0 341 44.77 0.00 74 18:03:41 196 196 6934 6427 1 0 337 19.34 0.00 75 18:06:57 114 119 10087 9913 5 0 0 4.02 0.00 76 18:08:56 308 308 7926 7783 1216 0 4 2.59 0.00 77 18:14:04 311 311 7740 5113 3005 0 312 8.25 0.00 78 18:19:15 490 0 15094 7826 8 0 0 24.13 0.00
Promon itself hangs for 1-2 seconds on the two screens:
Status: Buffer Lock Queue
Activity: I/O Operations by File
All screens above should represent data for 2 sec intervals but in fact the interval is 3-4 seconds.
My questions:
What data in promon would hint that the root case is the disks?
How can the slow disk access forces a Progress session to hold MTX latch for the tens seconds instead of a part of /milli/second? The latches are used for the short term resource locks. The disk operations are associated with the medium term locks and Progress uses the semaphores to wait for such resources. The disk operations should not affect the latch locks but our situation seems to say the opposite.
Any thoughts?
Thanks in advance,
George
I have witnessed similar behavior on AIX before. It turned out to be a global pause from the AIX side when errors were detected with the adapters/disks.
We could map the events to the errors in the system logs. Nothing on the Progress side knew that they were waiting on disk or memory.... something lower at the kernel level was just not giving out the time slices.
What exact errors did you see with the disks and what is your OS and disk configuration?
It's HP-UX B.11.31 U ia64
I don't know about the disk errors and disk configuration. The customer is contacting us regarding the questions related strictly to Progress. When they got the direct evidences of the problems with disks they just closed a case with us and opened a case with HP.
> Nothing on the Progress side knew that they were waiting on disk or memory.... something lower at the kernel level was just not giving out the time slices.
Exactly. But it looks strange for me. Can the disks freeze the time? Can the disks roll the time back into past? ;-)
Well in our case (and I suspect yours)... the disks were not really the problem.
In our example once one of the adapters failed all of the work was supposed to go to one of the other adapters (we had 4)... but it failed just enough to still be "valid" to AIX... and every time an IO went to that adapter large parts of AIX itself hung until that IO timed out and an error was generated. Not just the disk IO but memory and cpu scheduling as well.
IBM at first blamed EMC and then admitted there was a bug in the failover... the large scale pauses were supposed to be a one time thing while the adapter was being marked dead and all of the processes moved over to the other adapters.
Not much you can do from a Progress or Oracle perspective to know that the OS just went crazy. Having a process trying to sleep for 2 seconds and ending up taking 4 seconds isn't that weird to me in that situation.
Thanks for sharing the details of your case!
> Not much you can do from a Progress or Oracle perspective to know that the OS just went crazy.
Protrace file of a process that was trying to connect a database seems to point out to the OS: bkOpenOneFile and other "Open" functions.
> Not just the disk IO but memory and cpu scheduling as well.
Records were read well when they were read from memory rather than from disk.
Sounds like your problem was slightly different than ours.
If you weren't seeing massive waits or queuing with iostat that would seem to point in the direction of some kind of connection failure.
If you were seeing few reads/writes with lots of waits and queuing then that would probably point towards horrible disk speed.
It will be interesting to see what they come back with on the SAN side.
Activity: Summary
Time Commits DbRd DbWrt BiWrt RecRead Update Create Del ActiveTran 18:25:41 221.0 1.5 0 39.0 124475.5 457.5 64.5 28.5 27 18:25:45 0 0 0 0 19683.0 0 0 0 28 18:25:48 0 0 0 0 10674.0 0 0 0 28 18:25:52 0 0 0 0 7350.0 0 0 0 28 18:25:55 0 0 0 0 7351.0 0 0 0 28 18:25:59 0 0 0 0 2271.5 0 0 0 28 18:26:02 0 0 149.5 0 242.0 0 0 0 28 18:26:06 99.5 0 90.5 19.5 10208.5 242.0 0 0 30 18:26:09 0 0 182.5 0 1933.5 0 0 0 30 18:26:13 0 0 0 0 54299.5 0 0 0 30 18:26:16 0 0 0 0 319.0 0 0 0 30 18:26:20 0 0 0 0 5255.0 0 0 0 30 18:26:23 0 0 0 0 338.5 0 0 0 30 18:26:27 0 0 0 0 368.5 0 0 0 30 18:26:31 215.5 0 0 50.0 30975.0 624.5 2.5 0 38 18:26:35 136.0 9 88.5 31.5 37799.0 375.0 8.5 4.5 51 18:26:38 0 0 0 0 75217.0 0 0 0 51 18:26:42 0 0 0 0 399.0 0 0 0 51 18:26:45 0 0 0 0 328.0 0 0 0 51 18:26:49 0 0 0 0 455.5 0 0 0 51 18:26:52 0 0 0 0 280.5 0 0 0 51 18:26:56 0 0 0 0 308.5 0 0 0 51 18:26:59 209.5 0 16.0 32.5 1365.5 624.5 0 0 50 18:27:03 0 0 0 0 381.0 0 0 0 50 18:27:06 0 0 0 0 469.0 0 0 0 50 18:27:10 0 0 0 0 355.5 0 0 0 50 18:27:13 0 0 0 0 14743.0 0 0 0 50 18:27:17 0 0 0 0 484.5 0 0 0 50 18:27:20 46.0 0.5 38.0 7.0 2730.0 132.5 1.0 0 50 18:27:24 0 0 275.5 0 361.5 0 0 0 50
As you have directed that there were issues with disks, it may be because of RAID configuration. Progress recommends RAID 10. What RAID customer is using?
The customer of ours recently had the strong performance degradation of their Progress application. The root case turned out to be the disks: time to copy 1 GB file took 5-6 minutes instead of 7 seconds during the normal period. There were no IO waits. CPU was 95% idle. The iostat shown very low disk activity.
From the Progress' point of view there were the following problems:
Time to connect db through shared memory took 20-25 seconds.
We had enough time to generate the protrace file:
(5) 0xe000000130043440 ---- Signal 16 (SIGUSR1) delivered ---- (6) 0xc00000000054cc10 _open_sys + 0x30 [/usr/lib/hpux64/libc.so.1] (7) 0xc000000000561070 _open + 0x170 at ../../../../../core/libs/libc/shared_em_64_perf/../core/syscalls/t_open.c:28 [/usr/lib/hpux64/libc.so.1] (8) 0x40000000008069e0 dbut_utOsOpen + 0x90 at /vobs_rkt/src/dbut/ut/utosopen.c:310 [/usr/dlc/bin/_progres] (9) 0x4000000000716ec0 bkioOpenUNIX + 0xb0 at /vobs_rkt/src/dbmgr/bkio/bkio.c:1005 [/usr/dlc/bin/_progres] (10) 0x400000000071bc70 bkioOpen + 0x210 at /vobs_rkt/src/dbmgr/bkio/bkiotop.c:784 [/usr/dlc/bin/_progres] (11) 0x4000000000727c50 bkOpenOneFile + 0x290 at /vobs_rkt/src/dbmgr/bk/bkopen.c:2052 [/usr/dlc/bin/_progres] (12) 0x4000000000727720 bkOpenAllFiles + 0x1d0 at /vobs_rkt/src/dbmgr/bk/bkopen.c:1942 [/usr/dlc/bin/_progres]
Remote connections were instant.
Promon shown that there were more-or-less normal activity during 5-10 seconds. Then db hung approximately for 20 seconds. During this time one of the processes (the different ones) with the active transactions hold and did not release MTX latch. The same
process during the same period hold and did not release a buffer lock (EXCL). There are no any signs that a process that hold MTX latch was waiting for any other db resource. Processes had created the small transactions. It looked like db was "proquiet'ed"
except the facts that during proquiet it's a db broker that holds the MTX latch and the broker also holds BIB latch. In our case BIB was not locked. During the pauses there were neither db writes nor db reads. There were only the record reads - obviously the
client's sessions were able to read the records that were already located in buffer pool.
The example that combine a few promon screens:
Activity: Latch Counts
Status: Active Transactions
Status: Buffer Lock Queue
George... can you post the iostat output? Does hpux have the -x option for queue depth?
Whether it was a bad block (which the SAN should have migrated from quickly) or some kind of HPUX or hardware issue you should be able to tell a lot by looking at the service times and write patterns.
@Mike
> Opening a file does not write().
promon/Activity: Summary proves that during the pauses the both reads and writes were blocked. Opening a file reads the disks.
@Sauqad
> What RAID customer is using?
To my shame, I do not know. It's not the main application used by customer.
@TheMadDBA
>can you post the iostat output? Does hpux have the -x option for queue depth?
The customer (really my dbmon script) run iostat without any options: iostat 2 30
It was supposed that DBAs would adjust the options of OS commands before they will use the dbmon script but it's never happened.
Example of iostat at 18:25:55 (?) where I cut off the most disks with zero statistics:
device bps sps msps c1t0d1 0 0.0 1.0 c3t0d2 0 0.0 1.0 disk4 1016 124.8 1.0 disk5 0 0.0 1.0 disk190 0 0.0 1.0 disk199 1 1.0 1.0 disk205 4 0.5 1.0 disk206 8 1.0 1.0 disk214 24 3.0 1.0 disk215 121 12.4 1.0 disk216 71 8.9 1.0 disk225 61 7.4 1.0 disk226 44 5.4 1.0 disk228 24 0.5 1.0 disk232 32 0.5 1.0 disk234 8 0.5 1.0 disk237 4 0.5 1.0 disk239 4 0.5 1.0 disk247 32 0.5 1.0 disk253 44 5.4 1.0 disk256 36 1.0 1.0 disk258 4 0.5 1.0 disk268 4 0.5 1.0 disk269 0 0.0 1.0
disk4 shown the highest activity all the time:
Time device bps sps msps 18:25:41 disk4 1973 128.1 1.0 18:25:45 disk4 2615 304.8 1.0 18:25:48 disk4 1778 218.8 1.0 18:25:52 disk4 1371 167.2 1.0 18:25:55 disk4 1016 124.8 1.0 18:25:59 disk4 978 122.7 1.0 18:26:02 disk4 1091 130.7 1.0 18:26:06 disk4 1457 181.6 1.0 18:26:09 disk4 1175 143.8 1.0 18:26:13 disk4 1056 131.2 1.0 18:26:16 disk4 1133 137.1 1.0 18:26:20 disk4 1081 131.0 1.0 18:26:23 disk4 1314 248.0 1.0 18:26:27 disk4 1167 140.0 1.0 18:26:31 disk4 1154 141.0 1.0 18:26:35 disk4 1057 130.5 1.0 18:26:38 disk4 1025 124.6 1.0 18:26:42 disk4 995 124.6 1.0 18:26:45 disk4 1132 140.3 1.0 18:26:49 disk4 993 122.4 1.0 18:26:52 disk4 984 123.0 1.0 18:26:56 disk4 1075 126.4 1.0 18:26:59 disk4 958 119.8 1.0 18:27:03 disk4 1094 137.4 1.0 18:27:06 disk4 1045 126.4 1.0 18:27:10 disk4 1080 131.5 1.0 18:27:13 disk4 1130 130.8 1.0 18:27:17 disk4 7992 405.0 1.0 18:27:20 disk4 12493 340.3 1.0 18:27:24 disk4 12453 298.5 1.0
Example of iostat from their main server (I cut off the disks with low bps):
device bps sps msps disk762 21828 2560.7 1.0 disk766 9987 1583.1 1.0 disk767 10150 1636.3 1.0 disk801 3590 229.9 1.0 disk802 3463 224.4 1.0 disk803 3536 226.4 1.0 disk918 1218 151.2 1.0 disk919 1179 145.8 1.0 disk920 1320 161.7 1.0 disk940 3713 194.5 1.0 disk941 3656 183.1 1.0 disk942 3797 214.9 1.0
bps Kilobytes transferred per second
sps Number of seeks per second
msps Milliseconds per average seek
@TheMadDBA
> Having a process trying to sleep for 2 seconds and ending up taking 4 seconds isn't that weird to me in that situation.
All sampling intervals of Activity screens in promon are exactly 2 seconds: "Total" values divided by "Per Sec" values are always 2 sec. It's because the sampling in promon creates a copy of activity statistics data stored wholly in shared memory. But some promon screens ("Status" as well "Activity") do read db blocks from the disks.
07/16/15 Activity: Summary
18:25:41 07/16/15 18:25 to 07/16/15 18:25 (2 sec)
In this example 18:25:41 is NOT the end of sampling interval. It's the time when promon displays statistics. For example, the "Activity: Summary" screen displays the activity statistics (Commits, Reads, Writes) from shared memory as well as from database blocks (Free blocks, RM chain). Sometimes the difference between timestamps of promon screens can be the meaningful information that might help to identify a bottleneck.
I often saw the delays when promon displays "Status: Buffer Lock Queue".
"Activity: I/O Operations by File" were not on my radar in past.
The real reason to start this thread was the performance issue that the same customer has in their main application. The bottleneck is MTX latch. Db is connected by approximate 4 thousand users that are creating 200-500 active transactions opened at the same time. 100+ commits per sec. Rollback of just one large transaction or the massive record deletion done by a couple of the sessions at the same time cause the end-users to complain about performance. MTX latch becomes busy 60% of time. Bigrow tests show 4 MB/sec of the unbuffered synchronous writes. I would expect 2-3 times higher rate. Unfortunately I don't know when the customer run the tests and how busy were the disks at that moment.
There are 2 sec pauses during checkpoints. Though it's maybe normal for -B 2,000,000 + -B2 1,000,000.
No. Time Len Freq Dirty CPT Q Scan APW Q Flushes Duration Sync Time 406 12:27:20 248 259 27338 21387 7490 553 0 1.75 0.64 407 12:31:39 200 216 26326 22649 4156 475 0 1.82 0.69 408 12:35:15 222 237 23782 18915 5772 384 0 2.52 1.40 409 12:39:12 223 235 23843 20442 3085 462 0 1.75 0.64 410 12:43:07 226 236 25166 22218 2760 183 0 1.86 0.75 411 12:47:03 208 217 26666 24048 2721 414 0 1.99 0.88 412 12:50:40 187 196 24945 23267 3109 147 0 2.10 0.98 413 12:53:56 227 237 22586 19827 5270 657 0 2.11 0.00 414 12:57:53 161 0 27228 18055 4600 239 0 2.07 0.95
My main question: can the slow disks cause the contention for MTX latch?
OE version on the main system? Unless it is 10.2B05, 11.3.1.0, 11.4.0.0 or higher you might be hitting this bug for the undo issue.
knowledgebase.progress.com/.../000042774
Those are not great sync times and I would look into the OS details to figure out why it is taking so long to flush to disk. Either the disks are slow or you have some HPUX sync issues.
MTX latches are also used for BI log space allocation and not just microtransactions.... at least according to the documentation.
I know this bug well - we worked together with PSTS. The bug's symptoms: bi reads are (much) higher than bi writes, the process that undoes transaction owns the MTX latch the most of the time. It's not the current case.
The customer uses V10.2B0817 for main application and V10.2B0723 for the application mentioned at the beiginning of the topic.
> MTX latches are also used for BI log space allocation.
What is "BI log space allocation"? Adding new bi cluster? Writing bi block to the disks?
If MTX latch is locked during some very long operation then it's very likely that promon will catch the owner of the latch.
The number of MTX/BIB/AIB latch locks used to be very close each other and close to the number of recovery notes generated by transactions.
Example:
Time MTX BIB AIB 18:25:41 2796.5 2580.5 2575.0 18:25:45 0 0 0 18:25:48 0 0 0 18:25:52 0 0 0 18:25:55 0 0 0 18:25:59 0 0 0 18:26:02 0 0 0 18:26:06 1086.5 1014.5 1017.0 18:26:09 0 0 0 18:26:13 0 0 0 18:26:16 0 0 0 18:26:20 0 0 0 18:26:23 0 0 0 18:26:27 0 0 0 18:26:31 2722.5 2590.0 2577.0 18:26:35 1734.5 1648.5 1641.0 18:26:38 0 0 0 18:26:42 0 0 0 18:26:45 0 0 0 18:26:49 0 0 0 18:26:52 0 0 0 18:26:56 0 0 0 18:26:59 2707.5 2562.5 2562.0 18:27:03 0 0 0 18:27:06 0 0 0 18:27:10 0 0 0 18:27:13 0 0 0 18:27:17 0 0 0 18:27:20 583.5 568.0 564.0 18:27:24 0 0 0
UPD:
I re-checked data from Activity: Latch Counts. The number of MTX locks used to be 2-5% higher than BIB locks. But when MTX latch naps are very high then the number of MTX locks is 15-20% higher than BIB locks.
"BI log space allocation"... That is all the documentation provides, no more details than that.
I honestly think you are better served finding out why the sync times are so high instead of worrying about the latches. If the latches are still there after you have a reasonable sync time then I would worry about them.
I am assuming that since this is a larger install that you have quite a few disks involved in the IO. What is so special about disk4? Is it not part of the main stripe set?
Since it is HPUX you can get a lot of information from glance about what is happening to specific processes and disks/adapters/etc.
Thanks, TheMadDBA. I will ask the customer to check that. The customer is located at 2800 miles from me. It's 0.7 of Earth radius. It's hard for me to controll their environment. ;-)
> There are 2 sec pauses during checkpoints. Though it's maybe normal for -B 2,000,000 + -B2 1,000,000.
I don't see such pauses on a customer system (10.2B07, RHEL) with -B of about 3,155,000 and -B2 41,000. Given your other provided information, my guess is that your checkpoint duration is related more to BI and AI buffer flushes rather than to buffer scans.
The customer run on HP Superdome. Its architecture makes the access to the shared memory slower.
There are no flushes of DB blocks. I'm not sure if Progress flushes BI and AI buffers. Anyway there are no much to flush: -bibufs/-aibufs 20 and the number of the full BI buffers is about 2 with max value equals to 5. At checkpoint Progress creates and writes the RL_INMEM note with the compressed list of the opened transactions. In this case the list is rather large: 200-500 transactions but I think it still requires just a small part of second to write the RL_INMEM note.
OK. My point was that I think the buffer pool scan portion of the checkpoint should be very fast, so some other portion of checkpoint processing is likely the culprit for your durations.
Given the client's other apparent I/O problems, isn't it possible that even if there is one BI or AI buffer to flush during a checkpoint that it could be taking longer than it should?
Unfortunately I'm still waiting for a reply from the customer regarding the disk performance. My point: there are no explicit mistakes in Progress configuration of the main application used by the customer. The topic was started for its subject and it might be related or not related to the performance issues of their main application. It can be just a coincidence that their old application shown the similar symptoms while we know the root case of the issue.
NUMA.... yuck ;-)
That opens a entire new set of potential performance issues. Setting the affinity for the core database processes usually helps somewhat, but as you know there are limits.
The odds are really high that a sync or IO issue exists and Progress is just the victim. Tracking down the root will take some time and effort depending on how skilled and motivated their infrastructure admins are.