All,
We migrated our systems onto a Linux server running OE 11.4 back in July . In Late November we had an outage on our Live server, and the DB log file reported a semaphore error. Server was rebooted, and everything went back to normal.
Saturday just gone, we had another outage, followed by two on Monday and 2 yesterday.
we are seeing entries like:
[2015/12/14@15:27:47.359+0000] P-65624 T-140275601626944 I SRV 21: (1075) Semaphore id 30 was removed
[2015/12/14@15:27:47.359+0000] P-143185 T-139735250765632 I SRV 23: (723) ** The server has disconnected.
[2015/12/14@15:27:47.359+0000] P-65624 T-140275601626944 I SRV 21: (723) ** The server has disconnected.
[2015/12/14@15:27:47.359+0000] P-143185 T-139735250765632 F SRV : (6517) SYSTEM ERROR: Unexpected error return from semAdd -1
[2015/12/14@15:27:51.722+0000] P-9582 T-140607981442880 I ABL 104: (1075) Semaphore id -32739 was removed
[2015/12/14@15:27:51.722+0000] P-9549 T-140463995369280 I ABL 103: (1075) Semaphore id -32739 was removed
[2015/12/14@15:27:51.723+0000] P-9549 T-140463995369280 F ABL : (6517) SYSTEM ERROR: Unexpected error return from semAdd -1
[2015/12/14@15:27:54.167+0000] P-11878 T-139624773539648 I SRV 18: (1132) Invalid semaphore id
[2015/12/14@15:27:54.167+0000] P-11878 T-139624773539648 I SRV 18: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = 28, errno = 22.
[2015/12/14@15:27:54.167+0000] P-11878 T-139624773539648 I SRV 18: (-----) semLockLog_2: semValue val = -1
[2015/12/14@15:27:54.167+0000] P-11878 T-139624773539648 I SRV 18: (1132) Invalid semaphore id
[2015/12/14@15:27:54.167+0000] P-11878 T-139624773539648 I SRV 18: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32739, errno = 22.
[2015/12/14@15:27:54.169+0000] P-11878 T-139624773539648 I SRV 18: (2520) Stopped.
Our sys Admin has increased both SEMMSL, SEMMNS and SEMOPM slightly, but based on the limited reading about the values these need to be (article P61278) , we should have been fine.
Is there anything we can do to monitor our situation, and what should we be looking at. The Production server has 128Gb on memory, and we've only allocated a small percentage of that to the database at the moment, so we should not be running out of memory, as one article I read suggested
Regards,
Steve Salt
Application Developer (accidental DBA)
knowledgebase.progress.com/.../21142
<quote>
Working with multiple databases, the semaphore id's get locked by the process.
[snip]
At the beginning it seems there is a lack of resources. But if the databases were working with the same kernel parameters and there were no changes, it looks like there is a conflict in the use of the semaphores between these two databases.
</quote>
Interesting ...
We have two other DB's running - one is a schema placeholder for our application, because we don't have the source code for this element, we can't do away with this - there were no errors in the log for this, the other (a CRM DB) is used by Webspeed and the Broker connects to both the CRM DB, and our main DB.
The CRM log has the following, around the same timeframe (approx. 30 seconds later):
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (49) SYSTEM ERROR: Memory violation.
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (14658) Previous message sent on behalf of user 56, server pid 14722, broker pid 9734. (5512)
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (439) ** Save file named core for analysis by Progress Software Corporation.
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (14658) Previous message sent on behalf of user 56, server pid 14722, broker pid 9734. (5512)
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (1132) Invalid semaphore id
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = 32, errno = 22.
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (739) Logout usernum 56, userid dba, on WSH-MXP-P-01.danwood.ad batch.
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (1132) Invalid semaphore id
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (10839) SYSTEM ERROR: Unable to get value of semaphore set : semid = -32737, errno = 22.
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (-----) semLockLog_2: semValue val = -1
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (1132) Invalid semaphore id
[2015/12/14@15:28:29.153+0000] P-14722 T-139716379809600 I SRV 1: (10839) SYSTEM ERROR: Unable to set semaphore set : semid = -32735, errno = 22.
[2015/12/14@15:28:29.169+0000] P-14722 T-139716379809600 I SRV 1: (2520) Stopped.
This one has a memory violation and then a core file generated
I then found a procore file with the following:
14/12/15 15:27:51 [9549]
Progress Recent Message(s):
(723) (1075) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443) (1443)
** The server has disconnected. (723)
Semaphore id -32739 was removed (1075)
Duplicate unique key in database table. (1443)
Duplicate unique key in database table. (1443)
I don't know if this is related or not ... unfortunately, the error doesn't tell me which Table, or even DB this is related to ...
Steve