Following on from a colleague's recent question on our recurring database crashes due to Semaphores:
I was wondering if people could assist with what a typical range of semaphore-wait should be, in "normal" behaviour.
Our DB has been up for about a week now, and looking in PROMON R&D section, I can see the following values.
12/22/15 Activity: Other 11:16:21 12/15/15 12:37 to 12/22/15 11:12 (166 hrs 34 min) Total Per Min Per Sec Per Tx Commit 44492519 4452 74.20 1.00 Undo 247015 25 0.41 0.01 Wait on semaphore 376256 38 0.63 0.01 Non-blocking waits 25 0 0.00 0.00 Semaphore latch waits 43765 4 0.07 0.00 Flush master block 2097 0 0.00 0.00
Can anyone say whether the values of wait-on-semaphore = 38 per minute, or the semaphore-latch-waits of 4 per minute are OK, or extreme?
The above is taken from our Live database, but for the last two days, both of our test databases (copies of live) have suffered similar semaphore problems.
Any pointers would be a great help
> Can anyone say whether the values of wait-on-semaphore = 38 per minute, or the semaphore-latch-waits of 4 per minute are OK, or extreme?
Very low even for the small number of users:
semaphore-latch-waits-per-sec * a-few-tens-milliseconds * 100% is the percent of time that was lost by all active processes while waiting for some db resources.
Nevertheless you can check the details of wait-on-semaphore:
promon/R&D/debghb/6/8. Activity: Resource Queues
Wait-on-semaphore = sum of resource waits.
And IIRC, semaphore-latch-waits = the waits during logins/logouts.
I missed that the interval is 166 hrs 34 min (or almost 7 days). You should get statistics for a shorter interval. Otherwise the numbers are meaningless.
Thank you for the advice; I was clutching at straws really.
Over the last few weeks we have suffered several abnormal shutdowns, but with no real understanding of the problem.
Our environment is LIVE Linux server running 11.4 and a DEV/TEST Linux server (same config) running 11.5.1. We are not aware of any recent changes that could have influenced it.
A typical crash looks like:
[2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (1132) Invalid semaphore id [2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32736, errno = 22. [2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (739) Logout usernum 70, userid , on LPTP-126659. [2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (1132) Invalid semaphore id [2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. [2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (-----) semLockLog_2: semValue val = -1 [2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (1132) Invalid semaphore id [2015/12/21@17:22:52.937+0000] P-29320 T-140527096551232 I SRV 3: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = 33, errno = 22. [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (2520) Stopped. [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (1132) Invalid semaphore id [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (-----) semLockLog_2: semValue val = -1 [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (1132) Invalid semaphore id [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = 33, errno = 22. [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (453) Logout by dba on batch. [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (1132) Invalid semaphore id [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. [2015/12/21@17:22:52.939+0000] P-29320 T-140527096551232 I SRV 3: (-----) semLockLog_2: semValue val = -1 Mon Dec 21 17:22:56 2015 [2015/12/21@17:22:56.000+0000] P-108135 T-139822365607744 I ABL 36: (562) HANGUP signal received. Mon Dec 21 17:22:56 2015 [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL 36: (1132) Invalid semaphore id [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL 36: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32736, errno = 22. [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL 36: (2252) Begin transaction backout. [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL 36: (2253) Transaction backout completed. [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL 36: (453) Logout by trevorba on /dev/pts/26. [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL (1132) Invalid semaphore id [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. [2015/12/21@17:22:56.062+0000] P-108135 T-139822365607744 I ABL (-----) semLockLog_2: semValue val = -1 [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (2527) Disconnecting dead user 36. [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (1132) Invalid semaphore id [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32736, errno = 22. [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (1132) Invalid semaphore id [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (-----) semLockLog_2: semValue val = -1 [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (5292) SYSTEM ERROR: The broker is exiting unexpectedly, beginning Abnormal Shutdown. [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (-----) drexit: Initiating Abnormal Shutdown [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (439) ** Save file named core for analysis by Progress Software Corporation. [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (15192) The database will complete shutdown within approximately 60 seconds. [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (2249) Begin ABNORMAL shutdown code 2 [2015/12/21@17:22:58.864+0000] P-9113 T-139706817320768 I BROKER 0: (2525) Disconnecting dead server -1. [2015/12/21@17:22:58.886+0000] P-9131 T-140137808369472 I BIW 22: (2520) Stopped. [2015/12/21@17:22:58.886+0000] P-9131 T-140137808369472 I : (453) Logout by dba on batch. [2015/12/21@17:22:58.903+0000] P-9134 T-140473486939968 I APW 23: (453) Logout by dba on batch. [2015/12/21@17:22:59.510+0000] P-140038 T-139847294314304 I SRV 5: (2520) Stopped. [2015/12/21@17:22:59.525+0000] P-10638 T-140129577215808 I SRV 2: (2520) Stopped. [2015/12/21@17:22:59.788+0000] P-140013 T-139726395565888 I SRV 4: (2520) Stopped. [2015/12/21@17:22:59.864+0000] P-9113 T-139706817320768 I BROKER 0: (-----) Sending signal 12 to user 1
I've read quite a few of the knowledgebase articles without much success.
Does anyone see something that I don't, or is this the point to log a support-call?
For the most part, there is not much you can do about semaphore waits /except/ reduce the record lock conflicts in the application. semaphore activity isn't a terribly useful metric anyway (it only tells you that somebody was blocked and had to wait for /something/). pay no attention. look at the resource waits instead.
--
regards,
gus (gus@progress.com)
"If I set here and stare at nothing long enough, people might think
I'm an engineer working on something."
-- S.R. McElroy
> On Dec 22, 2015, at 6:25 AM, dianne.evans@danwood.co.uk wrote:
>
> Update from Progress Community [https://community.progress.com/]
>
> dianne.evans@danwood.co.uk [https://community.progress.com/members/dianne.evans_4000_danwood.co.uk]
>
> Following on from a colleague's recent question on our recurring database crashes due to Semaphores:
>
> I was wondering if people could assist with what a typical range of semaphore-wait should be, in "normal" behaviour.
>
> Our DB has been up for about a week now, and looking in PROMON R&D section, I can see the following values.
>
> 12/22/15 Activity: Other
> 11:16:21 12/15/15 12:37 to 12/22/15 11:12 (166 hrs 34 min)
>
> Total Per Min Per Sec Per Tx
>
> Commit 44492519 4452 74.20 1.00
> Undo 247015 25 0.41 0.01
> Wait on semaphore 376256 38 0.63 0.01
> Non-blocking waits 25 0 0.00 0.00
> Semaphore latch waits 43765 4 0.07 0.00
> Flush master block 2097 0 0.00 0.00
>
> Can anyone say whether the values of wait-on-semaphore = 38 per minute, or the semaphore-latch-waits of 4 per minute are OK, or extreme?
>
> The above is taken from our Live database, but for the last two days, both of our test databases (copies of live) have suffered similar semaphore problems.
>
> Any pointers would be a great help
>
> View online [https://community.progress.com/community_groups/openedge_general/f/26/t/22128]
>
> You received this notification because you subscribed to the forum. To stop receiving updates from only this thread, go here [https://community.progress.com/community_groups/openedge_general/f/26/t/22128/mute].
>
> Flag [https://community.progress.com/community_groups/openedge_general/f/26/t/22128?AbuseContentId=45ea083a-f4f9-4dec-a1a2-f855c1f77fda&AbuseContentTypeId=46448885-d0e6-4133-bbfb-f0cd7b0fd6f7&AbuseFlag=true] this post as spam/abuse.
IMHO you should open a support call.
They behavior that you are showing is not normal.
Should you decide to pursue it here in the forums it would be helpful to specify the flavor and version of Linux.
The output of "ipcs -a" just prior to a crash and immediately after would be interesting.
The output of "sysctl kernel.sem" would be interesting.
> semid = -32736
The value equates 32 minus 32K. Your database uses a few semaphore sets. The errors report semid = 31 and semid = 33 but not semid = 32.
Did you have the memory violation errors before first "Unable to set semaphore set" with negative semid?
> The output of "ipcs -a" just prior to a crash and immediately after would be interesting.
+1
Also I'd save the output of promon/R&D/debghb/6/16. Semaphores
It will give you the informtaion about semaphore IDs used by each session including the ones that will issue the errors.
BTW, is it the same case that was described in other topic?:
community.progress.com/.../22024
Infinite loop of 1443's in non-Progress database seemed to cause a memory leak, memory violation and the corruption of semaphore IDs:
Duplicate unique key in database table. (1443)