DB DOWN: SYSTEM ERROR: Wrong dbkey in block

Posted by e.schutten on 14-Jul-2016 03:43

Occasionally the database is going down with error SYSTEM ERROR: Wrong dbkey in block during the backup of our 1.5 TB database.

After running successfully for the last year the error occured again this night.

I did a dbtool: option 5 Read or validata db blocks, option 0=single-user, rowid=all, table=all, area=11, verbose=0, validation=0

The tool didn't find any errors. Now I'm running a dbanalys. Can I do more to find out if we really have a db corruption?

OE 10.2B SP8

Windows 2008 R2

Error message (11 times):

[2016/07/14@02:20:47.362+0200] P-7320       T-7264  I WDOG   20: (4232)  Corrupt block detected when attempting to release a buffer.
[2016/07/14@02:20:47.362+0200] P-7320       T-7264  I WDOG   20: (10560) bmReleaseBuffer: Error occurred in area 11, block number: 22885265, extent: F:\mfgdata\prod\trtrprod_11.d1.
[2016/07/14@02:20:47.362+0200] P-7320       T-7264  I WDOG   20: (10561) Writing block  22885265 to log file. Please save and send the log file to Progress Software Corp. for investigation.
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) SYSTEM DEBUG: Database buffer block
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) pbktbl = 0x000000026846BDB0
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) pbktbl->qself = 0x000b0001280fbdb0
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) XBKBUF(pbktbl->qself) = 0x000000026846bdb0
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) pbktbl->bt_qbuf = 000f0001c27827c8
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) XBKBUF(pbktbl->bt_qbuf) = 0x0000002302AF27C8
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) qusrctl: 0x000b00000000b3a0
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) use count: 1, governing latch: 24, lru: 0, state: 4
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) changed: 0, chkpt: 0, writing: 0, fixed: 0
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) aged: 0, onlru: 0, cleaning: 0, apwq: 0
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) bt_qlrunxt: 0x000b000128100370, bt_qlruprv:  0x000b000006cab670
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) bt_qapwnxt: 0x0000000000000000, bt_qapwprv:  0x0000000000000000
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) bt_qfstuser: 0x0000000000000000, bt_qcuruser:  0x0000000000000000
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) pbkbuf = 0x0000002302AF27C8
[2016/07/14@02:20:47.368+0200] P-7320       T-7264  I WDOG   20: (-----) Block dbkey = 1464657024   Offset = 187476099072

All Replies

Posted by Dileep Dasa on 14-Jul-2016 04:13

Other utilities that you can try to find the possible cause(s):

- proutil -C dbrpr, options 1; 1+4  

- proutil -C dbscan  

- dbtool, option 3

Posted by George Potemkin on 14-Jul-2016 04:49

Edwin,

Your database is NOT corrupted.

> Corrupt block detected when attempting to release a buffer.

Progress did not read block from disk.

You did not post the text of error 1124:

SYSTEM ERROR: Wrong dbkey in block. Found dbkey1, should be dbkey2 in area num 11.

I guess the found dbkey is real one: multiple by RPB (=64 in your case), below HWM.

> Error message (11 times):

Who reported it first?

Did watchdog report a dead user?

I bet a client's session died (or was killed) while reading data from disk. Buffer header stored the information about dbkey that the session was going to retrieve from disk ("should be dbkey"). The buffer itself stored some old block ("found dbkey").

> Can I do more to find out if we really have a db corruption?

The quickest way is:

find any_table_in_the_area no-lock where recid(any_table_in_the_area) eq should_be_dbkey_in_1422

Posted by e.schutten on 14-Jul-2016 04:58

Hi George,

The 1124 message is:

[2016/07/14@02:20:47.389+0200] P-7320       T-7264  F WDOG   20: (1124)  SYSTEM ERROR: Wrong dbkey in block. Found 1464656896, should be 1464657024 in area 11.

Posted by e.schutten on 14-Jul-2016 04:59

The watchdog was reporting this.

Posted by George Potemkin on 14-Jul-2016 05:00

So the only thing you need to find out: why session was died.

Posted by e.schutten on 14-Jul-2016 05:08

indeed, It has something to do with the backup. Because we had this problem 4 times in the last few years and it was always during the backup.

Posted by James Palmer on 14-Jul-2016 05:21

Is this a virtualized or physical environment?

Posted by e.schutten on 14-Jul-2016 05:59

No, it is not virtual. But I expect that there is some other software scanning the memory.

We have started the database again and it seems that is running normal.

Thanks ALL for replying. I appreciate it.

Posted by James Palmer on 14-Jul-2016 06:13

Lots of things to think about here: knowledgebase.progress.com/.../P5286

Posted by George Potemkin on 14-Jul-2016 06:40

The article missed the scenario that happend in this case and that I saw many times reported by our customers on Unix.

It's easy to reproduce if you have a rather large database: kill -9 a session that reads data from disk. The chances to get "Wrong dbkey error" are higher than "Usr died holding latch". There are no corruptions: neither on disk nor in memory. "Writing block to log file" will dump a block contents that is correct for a "found" dbkey. You can compare it with a dump of the same block on disk in database. "Should be" dbkey is correct as well.

"Wrong dbkey" reported by watchdog after a dead session that did NOT report the "wrong dbkey" error is always not a real corruption. It's confirmed many times.

This thread is closed