Hello,
HP-UX 11.31
10.2B0817
Anyone know what the meaning of this message in the database log?
(1077) SYSTEM ERROR: Attempt to free buffer type 2.
I do not find anything about this error in ProKB.
Regards,
Valeriy
The messages in the .lg file do look very strange or out of order. In the normal case, we should see (562) first, then (2252), then (2253). The mechanism to back out a transaction depends on the transaction entry (trid) in the transaction table in shared memory, but it looks the backout didn't do anything while online. And the real backout happened in the crash recovery phase, where the transaction table will be reestablished from the BI files, and this transaction table seems more reliable than the previous one.
Message (1077) cannot be explained either, it may be picked up from a wrong location.
Once we see (2256), the watch dog will set a FATAL flag in shared memory and the database will start an emergency shutdown process. In most cases, the messages following that are not reliable since we are in an emergency. So although the ABL process was already dead, the broker may still try to clean up something for it.
Put all these together, it looks like a memory stomp to me. And I won't rule out any OS issues either.
Dapeng
Its description says: "This message can occur when running multi-user PROGRESS on BTOS. It indicates that the client/server communications buffers have been corrupted".
I guess that only the second sentence is correct and the message exists since Progress V7.
It was a self-service client:
[2016/03/02@17:42:23.123+0600] P-18310 T-1 I ABL 1786: (2252) Begin transaction backout. [2016/03/02@17:42:23.123+0600] P-18310 T-2 I ABL 1786: (562) HANGUP signal received. [2016/03/02@17:42:23.123+0600] P-18310 T-1 I ABL 1786: (2253) Transaction backout completed. [2016/03/02@17:42:23.123+0600] P-18310 T-2 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.123+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.123+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.123+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.123+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.123+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.124+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.124+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:23.124+0600] P-18310 T-1 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2016/03/02@17:42:56.367+0600] P-25680 T-1 I WDOG 1317: (2256) SYSTEM ERROR: User 1786 died during microtransaction. [2016/03/02@17:42:56.491+0600] P-22516 T-1 I BROKER 0: (-----) Sending signal 7 to user 1786 [2016/03/02@17:42:57.625+0600] P-22516 T-1 I BROKER 0: (2527) Disconnecting dead user 1786.
There are other strange things in the log:
1) Transaction backout begun /before/ HANGUP signal;
2) /After/ transaction backout completed the client died during microtransaction.
3) Watchdog thought that user 1786 was dead but /later/ broker sent a signal to the user.
Thank you!
>>the network message buffer free chain
But it is a self-service client.
The problem is compounded by the fact that the database has been abnormally shutdown by broker after this error and the customer wants to understand why it happened and how to avoid it in the future (
What should I pay attention?
It is possible to say that it was not a failure in the OpenEdge, but it was a failure in the operating system or in the hardware?
Regards,
Valeriy
The scans of ai files allow us to look at the past. ;-)
Here is what ai scan says in this case:
The error happened after 79 sec since transaction beginning. At this moment the transaction has created 743 notes.
Just before the errors the transaction did not update db during 15 seconds. Most likely an user was enterring some data from keyboard (at least the transaction also paused for 13 and 40 seconds before this moment).
Then the transaction has updated an existent record:
Trid: 1813582918 code = RL_LOGOP_START version = 1 (12528) Trid: 1813582918 dbkey = 0 update counter = 0 (12530) Trid: 1813582918 code = RL_RMCHG version = 2 (12528) Trid: 1813582918 area = 7 dbkey = 5947509888 update counter = 430 (12529) Trid: 1813582918 code = RL_LOGOP_END version = 1 (12528)
And then the transaction has removed two index keys of the record:
Trid: 1813582918 code = RL_CXREM version = 2 (12528) Trid: 1813582918 area = 7 dbkey = 5944106112 update counter = 13375 (12529) Trid: 1813582918 code = RL_CXREM version = 2 (12528) Trid: 1813582918 area = 27 dbkey = 22753076 update counter = 427 (12529)
But according the previous patterns in the same transaction it should create three RL_CXREM notes.
I'm waiting from the customer for the block dumps to identify the index whose key the transaction failed to update.
Db log says that transaction backout completed immediately after backout's beginning but it's not true. The transaction was rolled back only during database restart.
So the errors happened while the session was trying to delete an index key.
The messages in the .lg file do look very strange or out of order. In the normal case, we should see (562) first, then (2252), then (2253). The mechanism to back out a transaction depends on the transaction entry (trid) in the transaction table in shared memory, but it looks the backout didn't do anything while online. And the real backout happened in the crash recovery phase, where the transaction table will be reestablished from the BI files, and this transaction table seems more reliable than the previous one.
Message (1077) cannot be explained either, it may be picked up from a wrong location.
Once we see (2256), the watch dog will set a FATAL flag in shared memory and the database will start an emergency shutdown process. In most cases, the messages following that are not reliable since we are in an emergency. So although the ABL process was already dead, the broker may still try to clean up something for it.
Put all these together, it looks like a memory stomp to me. And I won't rule out any OS issues either.
Dapeng
Thanks, Dapeng!
May be messages out of order because (562) is from thread number two (T-2) while all other error messages are from T-1.
Good eye, Dmitri!
I checked db log. The "T-2" threads were reported only for the following messages:
(562) HANGUP signal received.
(298) KILL signal received.
(4375) Received signal 7; handling as SIGHUP.
with an exeption for only one message (1077) SYSTEM ERROR: Attempt to free buffer type 2.
Statistics per db log:
T-1: 12600 (90%) "signal received" messages;
T-2: 1334 (10%) "signal received" messages .
The same process can post "T-1" as well as "T-2":
[2016/03/02@09:19:11.296+0600] P-19686 T-2 I ABL 2493: (562) HANGUP signal received.
[2016/03/02@09:19:11.391+0600] P-19686 T-1 I ABL 2493: (562) HANGUP signal received.
[2016/03/02@14:53:33.734+0600] P-16203 T-2 I ABL 2477: (298) KILL signal received.
[2016/03/02@14:53:33.849+0600] P-16203 T-1 I ABL 2477: (298) KILL signal received.
It seems to be just a "feature".
One year later. The same customer, the same error, the same environment.
[2017/04/20@10:51:22.171+0600] P-27948 T-1 I ABL 2680: (708) Userid is now UXXXXX. [2017/04/20@11:42:26.341+0600] P-27948 T-2 I ABL 2680: (562) HANGUP signal received. [2017/04/20@11:42:26.341+0600] P-27948 T-1 I ABL 2680: (2253) Transaction backout completed. [2017/04/20@11:42:26.341+0600] P-27948 T-1 I ABL 2680: (2253) Transaction backout completed. [2017/04/20@11:42:26.341+0600] P-27948 T-2 F ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.342+0600] P-27948 T-1 I ABL 2680: (2252) Begin transaction backout. [2017/04/20@11:42:26.342+0600] P-27948 T-1 I ABL 2680: (2253) Transaction backout completed. [2017/04/20@11:42:26.408+0600] P-27948 T-2 I ABL 2680: (453) Logout by UXXXXX on /dev/pts/2093. [2017/04/20@11:42:26.408+0600] P-27948 T-2 I ABL 2680: (5029) SYSTEM ERROR: Releasing multiplexed latch. latchId: 35247169 [2017/04/20@11:42:26.408+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.409+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.409+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.409+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.409+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.409+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.409+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:26.409+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2. [2017/04/20@11:42:29.265+0600] P-25767 T-1 I BROKER 0: (-----) Sending signal 7 to user 2680 [2017/04/20@11:42:38.414+0600] P-25767 T-1 I BROKER 0: (2527) Disconnecting dead user 2680.
The user’s session had an allocated transaction:
04/20/17 Status: Active Transactions
11:42:03
Usr Name Type Login time Tx start time Trans id Trans State
2680 UXXXXX SELF/ABL 04/20/17 10:51 - 1860882829 Begin
AI scan confirmed that the session did not have an active transaction.
latchId 35247169 is LKT latch family. IOW, the session used a lock table /after/ logout message.
Session tried to create protrace file but the file contains only a few words: "PROGRESS stack trace as of ". That’s all.
Any ideas what can cause the error?
Would it be it be a good idea to crash the session with protrace file after first error “Attempt to free buffer type 2” next time when it will happen again?
The funny feature of this error - it happens once per year, only in March or April.
2016 (10.2B0817):
[2016/03/02@17:42:23.123+0600] P-18310 T-2 F ABL 1786: (1077) SYSTEM ERROR: Attempt to free buffer type 2.
2017 (10.2B0817):
[2017/04/20@11:42:26.408+0600] P-27948 T-2 I ABL 2680: (1077) SYSTEM ERROR: Attempt to free buffer type 2.
2018 (11.7.2):
[2018/04/20@10:18:13.812+0600] P-4049 T-1 F ABL 3146: (1077) SYSTEM ERROR: Attempt to free buffer type 2.