Semaphore errors

Posted by SteveSaltDanwood on 16-Dec-2015 06:19

All,

We migrated our systems onto a Linux server running OE 11.4 back in July . In Late November we had an outage on our Live server, and the DB log file reported a semaphore error. Server was rebooted, and everything went back to normal.

Saturday just gone, we had another outage, followed by two on Monday and 2 yesterday.

we are seeing entries like:

[2015/12/14@15:27:47.359+0000] P-65624      T-140275601626944 I SRV    21: (1075)  Semaphore id 30 was removed
[2015/12/14@15:27:47.359+0000] P-143185     T-139735250765632 I SRV    23: (723)   ** The server has disconnected.
[2015/12/14@15:27:47.359+0000] P-65624      T-140275601626944 I SRV    21: (723)   ** The server has disconnected.
[2015/12/14@15:27:47.359+0000] P-143185     T-139735250765632 F SRV      : (6517)  SYSTEM ERROR: Unexpected error return from semAdd -1
[2015/12/14@15:27:51.722+0000] P-9582       T-140607981442880 I ABL   104: (1075)  Semaphore id -32739 was removed
[2015/12/14@15:27:51.722+0000] P-9549       T-140463995369280 I ABL   103: (1075)  Semaphore id -32739 was removed
[2015/12/14@15:27:51.723+0000] P-9549       T-140463995369280 F ABL      : (6517)  SYSTEM ERROR: Unexpected error return from semAdd -1
[2015/12/14@15:27:54.167+0000] P-11878      T-139624773539648 I SRV    18: (1132)  Invalid semaphore id
[2015/12/14@15:27:54.167+0000] P-11878      T-139624773539648 I SRV    18: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 28, errno = 22.
[2015/12/14@15:27:54.167+0000] P-11878      T-139624773539648 I SRV    18: (-----) semLockLog_2: semValue val = -1
[2015/12/14@15:27:54.167+0000] P-11878      T-139624773539648 I SRV    18: (1132)  Invalid semaphore id
[2015/12/14@15:27:54.167+0000] P-11878      T-139624773539648 I SRV    18: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32739, errno = 22.
[2015/12/14@15:27:54.169+0000] P-11878      T-139624773539648 I SRV    18: (2520)  Stopped.

Our sys Admin has increased both SEMMSL, SEMMNS and SEMOPM slightly, but based on the limited reading about the values these need to be (article P61278) , we should have been fine.

Is there anything we can do to monitor our situation, and what should we be looking at. The Production server has 128Gb on memory, and we've only allocated a small percentage of that to the database at the moment, so we should not be running out of memory, as one article I read suggested

Regards,

Steve Salt

Application Developer (accidental DBA)

All Replies

Posted by George Potemkin on 16-Dec-2015 06:51

knowledgebase.progress.com/.../21142

<quote>

Working with multiple databases, the semaphore id's get locked by the process.

[snip]

At the beginning it seems there is a lack of resources.  But if the databases were working with the same kernel parameters and there were no changes, it looks like there is a conflict in the use of the semaphores between these two databases.

</quote>

Posted by SteveSaltDanwood on 16-Dec-2015 07:55

Interesting ...

We have two other DB's running - one is a schema placeholder for our application, because we don't have the source code for this element, we can't do away with this - there were no errors in the log for this, the other (a CRM DB) is used by Webspeed and the Broker connects to both the CRM DB, and our main DB.

The CRM log has the following, around the same timeframe (approx. 30 seconds later):

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (49)    SYSTEM ERROR: Memory violation.

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (14658) Previous message sent on behalf of user 56, server pid 14722, broker pid 9734. (5512)

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (439)   ** Save file named core for analysis by Progress Software Corporation.

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (14658) Previous message sent on behalf of user 56, server pid 14722, broker pid 9734. (5512)

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (1132)  Invalid semaphore id

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = 32, errno = 22.

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (739)   Logout usernum 56, userid dba, on WSH-MXP-P-01.danwood.ad batch.

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (1132)  Invalid semaphore id

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = -32737, errno = 22.

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (-----) semLockLog_2: semValue val = -1

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (1132)  Invalid semaphore id

[2015/12/14@15:28:29.153+0000] P-14722      T-139716379809600 I SRV     1: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32735, errno = 22.

[2015/12/14@15:28:29.169+0000] P-14722      T-139716379809600 I SRV     1: (2520)  Stopped.

This one has a memory violation and then a core file generated

I then found a procore file with the following:

14/12/15 15:27:51  [9549]

Progress Recent Message(s):

  (723) (1075) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443)

** The server has disconnected. (723)

Semaphore id -32739 was removed (1075)

Duplicate unique key in database table. (1443)

Duplicate unique key in database table. (1443)

I don't know if this is related or not ... unfortunately, the error doesn't tell me which Table, or even DB this is related to ...

Steve

This thread is closed