Semaphore Wait frequency

Posted by dianne.evans@danwood.co.uk on 22-Dec-2015 05:24

Following on from a colleague's recent question on our recurring database crashes due to Semaphores:

I was wondering if people could assist with what a typical range of semaphore-wait should be, in "normal" behaviour.

Our DB has been up for about a week now, and looking in PROMON R&D section, I can see the following values.

12/22/15        Activity: Other
11:16:21        12/15/15 12:37 to 12/22/15 11:12 (166 hrs 34 min)

                                    Total         Per Min          Per Sec          Per Tx

Commit                           44492519            4452            74.20            1.00
Undo                               247015              25             0.41            0.01
Wait on semaphore                  376256              38             0.63            0.01
Non-blocking waits                     25               0             0.00            0.00
Semaphore latch waits               43765               4             0.07            0.00
Flush master block                   2097               0             0.00            0.00

Can anyone say whether the values of wait-on-semaphore = 38 per minute, or the semaphore-latch-waits of 4 per minute are OK, or extreme?

The above is taken from our Live database,  but for the last two days, both of our test databases (copies of live) have suffered similar semaphore problems.

 

Any pointers would be a great help

All Replies

Posted by George Potemkin on 22-Dec-2015 05:52

> Can anyone say whether the values of wait-on-semaphore = 38 per minute, or the semaphore-latch-waits of 4 per minute are OK, or extreme?

Very low even for the small number of users:
semaphore-latch-waits-per-sec * a-few-tens-milliseconds * 100% is the percent of time that was lost by all active processes while waiting for some db resources.

Nevertheless you can check the details of wait-on-semaphore:
promon/R&D/debghb/6/8. Activity: Resource Queues

Wait-on-semaphore = sum of resource waits.
And IIRC, semaphore-latch-waits = the waits during logins/logouts.

Posted by George Potemkin on 22-Dec-2015 06:19

I missed that the interval is 166 hrs 34 min (or almost 7 days). You should get statistics for a shorter interval. Otherwise the numbers are meaningless.

Posted by dianne.evans@danwood.co.uk on 22-Dec-2015 07:28

Thank you for the advice; I was clutching at straws really.

Over the last few weeks we have suffered several abnormal shutdowns, but with no real understanding of the problem.

Our environment is LIVE Linux server running 11.4 and a DEV/TEST Linux server (same config) running 11.5.1.   We are not aware of any recent changes that could have influenced it.

A typical crash looks like:

[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (1132)  Invalid semaphore id 
[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32736, errno = 22. 
[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (739)   Logout usernum 70, userid , on LPTP-126659. 
[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (1132)  Invalid semaphore id 
[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. 
[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (-----) semLockLog_2: semValue val = -1
[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (1132)  Invalid semaphore id 
[2015/12/21@17:22:52.937+0000] P-29320      T-140527096551232 I SRV     3: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = 33, errno = 22. 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (2520)  Stopped. 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (1132)  Invalid semaphore id 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (-----) semLockLog_2: semValue val = -1
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (1132)  Invalid semaphore id 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = 33, errno = 22. 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (453)   Logout by dba on batch. 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (1132)  Invalid semaphore id 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. 
[2015/12/21@17:22:52.939+0000] P-29320      T-140527096551232 I SRV     3: (-----) semLockLog_2: semValue val = -1

                Mon Dec 21 17:22:56 2015
[2015/12/21@17:22:56.000+0000] P-108135     T-139822365607744 I ABL    36: (562)   HANGUP signal received. 

                Mon Dec 21 17:22:56 2015
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL    36: (1132)  Invalid semaphore id 
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL    36: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32736, errno = 22. 
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL    36: (2252)  Begin transaction backout. 
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL    36: (2253)  Transaction backout completed. 
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL    36: (453)   Logout by trevorba on /dev/pts/26. 
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL        (1132)  Invalid semaphore id 
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL        (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. 
[2015/12/21@17:22:56.062+0000] P-108135     T-139822365607744 I ABL        (-----) semLockLog_2: semValue val = -1
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (2527)  Disconnecting dead user 36. 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (1132)  Invalid semaphore id 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32736, errno = 22. 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (1132)  Invalid semaphore id 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 31, errno = 22. 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (-----) semLockLog_2: semValue val = -1
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (5292)  SYSTEM ERROR: The broker is exiting unexpectedly, beginning Abnormal Shutdown. 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (-----) drexit: Initiating Abnormal Shutdown
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (439)   ** Save file named core for analysis by Progress Software Corporation. 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (15192) The database will complete shutdown within approximately 60 seconds. 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (2249)  Begin ABNORMAL shutdown code 2 
[2015/12/21@17:22:58.864+0000] P-9113       T-139706817320768 I BROKER  0: (2525)  Disconnecting dead server -1. 
[2015/12/21@17:22:58.886+0000] P-9131       T-140137808369472 I BIW    22: (2520)  Stopped. 
[2015/12/21@17:22:58.886+0000] P-9131       T-140137808369472 I          : (453)   Logout by dba on batch. 
[2015/12/21@17:22:58.903+0000] P-9134       T-140473486939968 I APW    23: (453)   Logout by dba on batch. 
[2015/12/21@17:22:59.510+0000] P-140038     T-139847294314304 I SRV     5: (2520)  Stopped. 
[2015/12/21@17:22:59.525+0000] P-10638      T-140129577215808 I SRV     2: (2520)  Stopped. 
[2015/12/21@17:22:59.788+0000] P-140013     T-139726395565888 I SRV     4: (2520)  Stopped. 
[2015/12/21@17:22:59.864+0000] P-9113       T-139706817320768 I BROKER  0: (-----) Sending signal 12 to user 1

I've read quite a few of the knowledgebase articles without much success.    

Does anyone see something that I don't,  or is this the point to log a support-call?

Posted by gus on 22-Dec-2015 07:30

For the most part, there is not much you can do about semaphore waits /except/ reduce the record lock conflicts in the application. semaphore activity isn't a terribly useful metric anyway (it only tells you that somebody was blocked and had to wait for /something/). pay no attention. look at the resource waits instead.

--

regards,

gus (gus@progress.com)

"If I set here and stare at nothing long enough, people might think

I'm an engineer working on something."

-- S.R. McElroy

> On Dec 22, 2015, at 6:25 AM, dianne.evans@danwood.co.uk wrote:

>

> Update from Progress Community [https://community.progress.com/]

>

> dianne.evans@danwood.co.uk [https://community.progress.com/members/dianne.evans_4000_danwood.co.uk]

>

> Following on from a colleague's recent question on our recurring database crashes due to Semaphores:

>

> I was wondering if people could assist with what a typical range of semaphore-wait should be, in "normal" behaviour.

>

> Our DB has been up for about a week now, and looking in PROMON R&D section, I can see the following values.

>

> 12/22/15 Activity: Other

> 11:16:21 12/15/15 12:37 to 12/22/15 11:12 (166 hrs 34 min)

>

> Total Per Min Per Sec Per Tx

>

> Commit 44492519 4452 74.20 1.00

> Undo 247015 25 0.41 0.01

> Wait on semaphore 376256 38 0.63 0.01

> Non-blocking waits 25 0 0.00 0.00

> Semaphore latch waits 43765 4 0.07 0.00

> Flush master block 2097 0 0.00 0.00

>

> Can anyone say whether the values of wait-on-semaphore = 38 per minute, or the semaphore-latch-waits of 4 per minute are OK, or extreme?

>

> The above is taken from our Live database, but for the last two days, both of our test databases (copies of live) have suffered similar semaphore problems.

>

> Any pointers would be a great help

>

> View online [https://community.progress.com/community_groups/openedge_general/f/26/t/22128]

>

> You received this notification because you subscribed to the forum. To stop receiving updates from only this thread, go here [https://community.progress.com/community_groups/openedge_general/f/26/t/22128/mute].

>

> Flag [https://community.progress.com/community_groups/openedge_general/f/26/t/22128?AbuseContentId=45ea083a-f4f9-4dec-a1a2-f855c1f77fda&AbuseContentTypeId=46448885-d0e6-4133-bbfb-f0cd7b0fd6f7&AbuseFlag=true] this post as spam/abuse.

Posted by ChUIMonster on 22-Dec-2015 08:42

IMHO you should open a support call.

They behavior that you are showing is not normal.

Should you decide to pursue it here in the forums it would be helpful to specify the flavor and version of Linux.

The output of "ipcs -a" just prior to a crash and immediately after would be interesting.

The output of "sysctl kernel.sem" would be interesting.

Posted by George Potemkin on 22-Dec-2015 09:27

> semid = -32736

The value equates 32 minus 32K. Your database uses a few semaphore sets. The errors report semid = 31 and semid = 33 but not semid = 32.

Did you have the memory violation errors before first "Unable to set semaphore set" with negative semid?

Posted by George Potemkin on 22-Dec-2015 09:37

> The output of "ipcs -a" just prior to a crash and immediately after would be interesting.

+1

Also I'd save the output of promon/R&D/debghb/6/16. Semaphores

It will give you the informtaion about semaphore IDs used by each session including the ones that will issue the errors.

Posted by George Potemkin on 24-Dec-2015 02:25

BTW, is it the same case that was described in other topic?:

community.progress.com/.../22024

Infinite loop of 1443's in non-Progress database seemed to cause a memory leak, memory violation and the corruption of semaphore IDs:

Duplicate unique key in database table. (1443)

This thread is closed