Hello :)
A customer has a DB on a VM (VMWare) since a few month and is getting corrupted data. OE 10.2B08, Suse Linux Enterprise SP 02, XFS files system for data partition.
We made a D/L binary, and before we could do that, we needed to rebuild indexes on meta schema (_File, _Index...).
Short time later new errors showed up in DB log file. We set -DbCheck and -MemCheck, but we got more bad blocks.
A look into Linux error protocols showed nothing.
We see the following areas which may have a problem:
- XFS as file system, although other end users have this
- SAN
- Memory/CPU (hardware is about 5 years old)
We plan do do a text D/L to make sure everything is fresh. And we go to ext3 which was the formerly file system of the installation.
Questions:
- Any concerns about XFS?
- Ever heard about damaged meta schema or damaged empty DB?
- Is it possible that a binary dump is defekt? Like a backup with probkup, which may backup defect blocks.
- Any other idea?
kind regards - Klaus
Note: The error message from log file will follow soon.
The customer change the hardware on which the VM runs. But errors still occured.
Then he instantiated the old hardware as a VM, with old Linux version and OE version.
That worked.
So it must be a combination of the while software stack...
The customer changed the hardware, but errors still occur.
He is not doing any of the bad things mentioned above :)
The he made a copy of the old hardware as VM. And this finally runs.
So it must be a combination of the software stack of the new server...
Thanks to all
Klaus
The customer changed the hardware, but errors still occur.
He is not doing any of the bad things mentioned above :)
The he made a copy of the old hardware as VM. And this finally runs.
So it must be a combination of the software stack of the new server.
Thanks to all
Klaus
I am sorry, text is in German, but message numbers are given :)
[2017/03/27@14:22:06.929+0200] P-5038 T--147446016 I ABL 26: (1422) SYSTEM ERROR: Index po-nummer in artkusta für recid 41520842 konnte nicht gelöscht werden.
The following repeat with a few blocks in the message (2 blocks, 107280832, 48955840).
5 or 6 elements are affected (like 184 in this example).
[2017/03/27@15:10:25.802+0200] P-2275 T--147462400 I ABL 17: (4430) SYSTEM ERROR: Index 49, Block 107280832, Element-Nr. 184: Falsche Informationsgröße in einem Leaf Block.
[2017/03/27@15:10:25.811+0200] P-2275 T--147462400 I ABL 17: (2816) vorherige Größe = 18, cs = 6, ks = 1, is = 191, Schlüsselanzahl = 184.
[2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14037) Fehlerdaten der Blockvalidierung für Index 49: nment ist 455, nlength ist 4117, level ist 1, aktueller Schlüssel ist 184, Offset ist 1673, func ist cxDoInsert
[2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14031) Ungültiger Indexblock gefunden
...
[2017/03/27@15:10:25.832+0200] P-2275 T--147462400 F ABL 17: (14036) SYSTEM ERROR: Ungültiger Indexblock FATAL
What sort of SAN? Does the customer do things with snapshots?
Does the customer use VMotion on this VM?
Doing any of the above without having a quiet point properly enabled seems like the most likely sources of corruption to me.
> text is in German, but message numbers are given :)
Translation:
[2017/03/27@14:22:06.929+0200] P-5038 T--147446016 I ABL 26: (1422) SYSTEM ERROR: Index po-nummer in artkusta for recid 41520842 could not be deleted. [2017/03/27@15:10:25.802+0200] P-2275 T--147462400 I ABL 17: (4430) SYSTEM ERROR: Index 49, block 107280832, element no. 184: bad info size in a leaf block. [2017/03/27@15:10:25.811+0200] P-2275 T--147462400 I ABL 17: (2816) prev size = 18, cs = 6, ks = 1, is = 191, key count = 184. [2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14037) Index 49 block validation error data: nment is 455, nlength is 4117, level is 1, current key is 184, offset is 1673, func is cxDoInsert [2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14031) Invalid Index Block Detected ... [2017/03/27@15:10:25.832+0200] P-2275 T--147462400 F ABL 17: (14036) SYSTEM ERROR: Invalid Index Block FATAL
Can you dump the index block with dbkey 107280832?
also doing
- backups with third-party or system backup tools while the database is in use, or
- skipping crash recovery with the -F option
can cause these sorts of errors
[mention:8d59dc807a2b4d4ea969c379c7a0b13d:e9ed411860ed4f2ba0265705b8793d05] Klaus, did you get to the bottom of this problem?
The customer change the hardware on which the VM runs. But errors still occured.
Then he instantiated the old hardware as a VM, with old Linux version and OE version.
That worked.
So it must be a combination of the while software stack...
The customer changed the hardware, but errors still occur.
He is not doing any of the bad things mentioned above :)
The he made a copy of the old hardware as VM. And this finally runs.
So it must be a combination of the software stack of the new server...
Thanks to all
Klaus
The customer changed the hardware, but errors still occur.
He is not doing any of the bad things mentioned above :)
The he made a copy of the old hardware as VM. And this finally runs.
So it must be a combination of the software stack of the new server.
Thanks to all
Klaus
THanks Klaus. Appreciate the quick response.