Index corruption bulletin

Posted by ChUIMonster on 17-May-2017 07:51

I've been double checking systems after the index corruption bulletin came out:

knowledgebase.progress.com/.../Critical-Alert-Index-manager-defect-can-corrupt-large-indexes-where-dbkeys-straddle-32-64-bit-boundary

If I am correctly interpreting the bulletin it seems like it is fair to say that a good *quick* check is to run:

prostrct statistics dbname | grep "Active blocks:" | sort -nb -k3

The "worst case" is a storage area with 256 rows per block. In that case 8,388,608 blocks is the limit (2^31 / 256). So if *none* of your storage areas are larger than that you're fine and you don't need to do a detailed area by area check.

And for more detailed checks the following table could be used:

RPB Max Active Blocks (in a storage area containing indexes)=== =================256 8,388,607128 16,77,215 64 33,554,431 32 67,108,863 16 134,217,727 8 268,435,455 4 536,870,911 2 1,073,741,823 1 2,147,483,647

Does this match other people's understanding?

If the table above is correct then another "quick" filter is that if your 4k db is less than 32GB or your 8k db is less than 64GB you need not bother checking.

OpenEdge RDBMS - Forum

Posted by Rob Fitzpatrick on 17-May-2017 08:43

> Does this match other people's understanding?

Yes.

Posted by gus bjorklund on 17-May-2017 08:47

yes, that is a good, quick way to check.

the table is almost correct. i’m just nitpicking. there is no block zero. the first block is nr. 1.

Posted by mfurgal on 17-May-2017 08:52

Tom:

Excellent job providing this table. Useful information as usual. Your understanding is correct as far as I can tell. We did a similar study with our MDBA customers and informed each or them whether they were at risk or not based on a similar algorithm. As you would suspect having a single 32 GB or 64 GB index area is pretty rare. I think we all, as a community, did a good job with storage area design back when Progress V9 came out to not have a single area called “Data” with a single associated area called “Index”.

The remedy is 2 fold.

1.) Upgrade to a version of OpenEdge that has this issue fixed (preferred)

2.) If you have a large index area, split it into multiple index areas, which requires downtime to truncate the area, index-move, and index rebuild the areas.

Now is the time to make sure you have no indexes in any storage area that contains table data.

Mike

Mike Furgal

Director – Database and Pro2 Services

PROGRESS Bravepoint

678-225-6331 (office)

617-803-2870 (cell)

mfurgal@progress.com

All Replies

Posted by cjbrandt on 17-May-2017 08:35

Good reason to make sure tables and indexes are in separate areas.

Posted by Rob Fitzpatrick on 17-May-2017 08:43

> Does this match other people's understanding?

Yes.

Posted by gus bjorklund on 17-May-2017 08:47

yes, that is a good, quick way to check.

the table is almost correct. i’m just nitpicking. there is no block zero. the first block is nr. 1.

Posted by mfurgal on 17-May-2017 08:52

Tom:

The remedy is 2 fold.

1.) Upgrade to a version of OpenEdge that has this issue fixed (preferred)

2.) If you have a large index area, split it into multiple index areas, which requires downtime to truncate the area, index-move, and index rebuild the areas.

Now is the time to make sure you have no indexes in any storage area that contains table data.

Mike

Mike Furgal

Director – Database and Pro2 Services

PROGRESS Bravepoint

678-225-6331 (office)

617-803-2870 (cell)

mfurgal@progress.com

Posted by ChUIMonster on 17-May-2017 08:54

It also makes a good, albeit unfortunate, argument for RPB = 1 in index areas :)

Posted by ChUIMonster on 17-May-2017 09:03

Gus, I fixed the table for you :)

Posted by ChUIMonster on 17-May-2017 09:13

A quick and dirty dot-p to munge "prostrct statistics" into an easier to review format:

/* stinfo.p
 *
 * output assumes that your window is at least 132 columns wide...
 *
 * pro -p stinfo.p -param /db/dbname
 *
 *
 * prostrct statistics dbname
 * ...
 * 
 *   Statistics for Area: Schema Area
 * 
 *   Files in Area: Schema Area
 * ...
 *   Database Block Usage for Area: Schema Area
 * 
 *   Active blocks: 22618
 *     Data blocks: 11437
 *     Free blocks: 11181
 *    Empty blocks: 6
 *    Total blocks: 22624
 *   Extent blocks: 4
 *   Records/Block: 64
 *    Cluster size: 1
 *  ...
 * 
 */

define variable lineIn as character no-undo.

define variable areaName   as character no-undo format "x(30)" label "Area Name".
define variable areaRPB    as integer   no-undo format   ">>9" label "RPB".
define variable areaCSZ    as integer   no-undo format   ">>9" label "CSZ".
define variable areaActive as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Active".
define variable areaData   as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Data".
define variable areaFree   as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Free".
define variable areaEmpty  as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Empty".
define variable areaTotal  as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Total".

input through value( "prostrct statistics " + session:parameter ).
repeat:

  import unformatted lineIn.

  lineIn = trim( lineIn ).

  if lineIn begins "Statistics for Area:" then
    assign
      areaName   = trim( entry( 2, lineIn, ":" ))
      areaRPB    = 0
      areaCSZ    = 0
      areaActive = 0
      areaData   = 0
      areaFree   = 0
      areaEmpty  = 0
      areaTotal  = 0
    .

  if lineIn begins "Active blocks:" then areaActive =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins   "Data blocks:" then areaData   =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins   "Free blocks:" then areaFree   =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins  "Empty blocks:" then areaEmpty  =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins  "Total blocks:" then areaTotal  =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins "Records/Block:" then areaRPB    = integer( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins  "Cluster size:" then areaCSZ    = integer( trim( entry( 2, lineIn, ":" ))).

  if lineIn begins "Cluster size:" /* and areaName matches "*idx*" */ then
    do:
      display
        areaName
        areaRPB
        areaCSZ
        areaActive
        areaData
        areaFree
        areaEmpty
        areaTotal
      .
    end.

end.

pause.

return.

Posted by mfurgal on 17-May-2017 10:00

I’m not buying that reason. RPB = 1 in general can lead to mistakes that are costly. Here’s some metrics from our MDBA program and this issue (potential).

Total # databases including test databases: 2,876

Total # Areas in these databases: 31,522

Total # databases that have a suspect storage area (based on RPB): 42

Total # Areas that are suspect (based on RPB): 55

Total # Areas that contain indexes that are suspect: 7

Mike

Mike Furgal

Director – Database and Pro2 Services

PROGRESS Bravepoint

678-225-6331 (office)

617-803-2870 (cell)

mfurgal@progress.com

Posted by ChUIMonster on 17-May-2017 10:25

I don't know why you wouldn't buy that reason -- it is self evident that it helps with this particular issue. True, it is just one lonely bullet point against the rather larger list of reasons not to do it.

I do understand that, by itself and without any "large index" problem that needs attention (and for some reason that cannot be addressed via hotfix or upgrade...), it isn't much of a reason to make that choice for general purpose index areas. I'm not suggesting or encouraging that everyone do so. I'm just noting that there is an interesting, albeit unusual and unfortunate, case where it actually has a clear benefit.

I know -- people read this stuff, extract a sound bite without bothering to understand the context and then do the wrong thing. But black and white rules without nuance can be just as problematic.

Posted by ctoman on 17-May-2017 10:36

- Nice job Tom. results :

Area Name RPB CSZ Active Data Free Empty Total

────────────────────────────── ─── ─── ───────────────── ───────────────── ───────────────── ───────────────── ─────────────────

Control Area 64 1 10 10 0 70 80

Schema Area 64 1 1,938 1,938 0 46 1,984

oeel32 32 64 2,475,071 2,475,014 57 2,595,073 5,070,144

arett64 64 64 1,096,831 1,096,774 57 1,184,769 2,281,600

glet64 64 64 1,181,567 1,181,510 57 1,607,041 2,788,608

icet64 64 64 899,775 899,718 57 1,635,329 2,535,104

vaesl64 64 64 2,349,311 2,349,254 57 185,793 2,535,104

event_trans_sub64 64 64 2,710,335 2,710,278 57 331,777 3,042,112

Misc8 8 64 486,975 486,918 57 273,601 760,576

Misc16 16 64 948,223 948,166 57 1,333,377 2,281,600

Misc32 32 64 6,489,535 6,489,478 57 1,876,161 8,365,696

Misc64 64 64 4,957,439 4,957,382 57 2,140,737 7,098,176

Misc128 128 64 2,713,279 2,713,222 57 1,342,849 4,056,128

Misc256 256 64 9,791 9,734 57 243,777 253,568

Tableszerorecords 128 64 27,519 27,463 56 226,049 253,568

AuditData 32 64 63 6 57 253,505 253,568

Lobs 64 64 1,284,863 1,284,806 57 743,233 2,028,096

oeel_Index32 1 8 326,231 326,230 1 717,641 1,043,872

arett_Index64 1 8 151,559 151,558 1 355,465 507,024

glet_Index64 1 8 154,215 154,214 1 352,809 507,024

icet_Index64 1 8 149,759 149,758 1 357,265 507,024

vaesl_Index64 1 8 416,583 416,582 1 597,449 1,014,032

event_trans_sub_Index64 1 8 175,407 175,406 1 78,113 253,520

MiscIndex8 1 8 55,039 55,038 1 200,977 256,016

MiscIndex16 1 8 194,263 194,262 1 317,753 512,016

MiscIndex32 1 8 1,215,695 1,215,694 1 1,068,353 2,284,048

MiscIndex64 1 8 1,165,879 1,165,879 0 1,118,169 2,284,048

MiscIndex128 1 8 859,223 859,223 0 917,817 1,777,040

MiscIndex256 1 8 4,567 4,566 1 11,449 16,016

AuditIndex 1 8 7 6 1 125,033 125,040

LobsIndex 1 8 7 6 1 256,009 256,016

ReplQueueArea 128 512 20,991 20,486 505 107,521 128,512

ReplCtrlArea 256 8 23 22 1 127,993 128,016

ReplMapArea 256 8 103 102 1 127,913 128,016

ReplQueueIdx 256 512 7,679 7,174 505 120,833 128,512

ReplCtrlIdx 256 8 23 22 1 127,993 128,016

ReplMapIdx 256 64 511 454 57 127,553 128,064

Posted by mikaelstenmark on 18-May-2017 08:27

Just to be on the safe side. I guess this issue only applies to those areas containing index so those areas containing only data are not affected by this issue.

Posted by ChUIMonster on 18-May-2017 08:32

That is correct.

Posted by e.schutten on 18-May-2017 09:40

We have an area that has more than 33,554,431 active blocks (64 RPB). In this area there is no index corruption, so installing hotfix 10.2B 64 is the only thing we shoud do, correct? Or should we also do an index rebuild/fix?

Posted by mfurgal on 18-May-2017 10:24

The bulletin and k-base article suggest that you may have index corruption but tools like idxfix will not report it.

http://knowledgebase.progress.com/articles/Article/Defect-Database-crashes-with-8783-or-210-error-due-to-index-corruption

These 3 points.

IDXFIX is unable to locate the index corruption.
IDXCHECK Option 3 can detect the index corruption, but can crash in OpenEdge versions prior to 11.7 due to the index corruption.
Index corruption seems to reoccur relatively quickly after an IDXBUILD has been done to address the index corruption issue.

The article also suggest that once the hotfix is applied that you need to rebuilt your problem indexes. Given that you have not seen any index issues, you do not know which indexes to rebuild. The safest and best bet would be to rebuild all the indexes in that area.

Mike

Mike Furgal

Director – Database and Pro2 Services

PROGRESS Bravepoint

678-225-6331 (office)

617-803-2870 (cell)

mfurgal@progress.com

Posted by e.schutten on 19-May-2017 04:08

We have an area in the database that exeeds 33,554,431 active blocks, but we don't have index problems as far as we now. Is only installing the hotfix enough? Or should we do also an index rebuild?

Posted by Marie Candela on 19-May-2017 08:18

How did you determine there is no index corruption?

Posted by ChUIMonster on 19-May-2017 09:59

If your area is that large *and* there are indexes in that area *and* the rows per block is 64 or greater then you *might* have index corruption.

You might not have seen any symptoms.

The safe thing to do is to apply the hotfix (or upgrade) and rebuild all indexes in that area. Merely installing the hotfix is not enough.

Posted by e.schutten on 22-May-2017 05:06

But what does the hotfix actually fix? I assume that when I have installed the hotfix, that I can use the IDXCHECK and IDXFIX, without any problems. So I can use IDXFIX when an index corruption occurs. Why should I rebuild the indexes after installing the hotfix?

Posted by Ruanne Cluer on 22-May-2017 05:53

The HF prevents future corruption should a new index block be needed for a multi-level index and the dbkey crosses the 32/64-bit boundary

If there is existing index corruption, it needs to be fixed

idxfix will not be able to find this corruption as it it does not traverse blocks like idxcheck does (even when the HF is applied).

idxcheck only with validation Option 3 (Validate record for each key) will find 32-bit 64-bit corruption within a block split, but:

- it can take a long time (even when the HF is applied).

- it can cause a dbdown (before the HF is applied).

Hence, if you have 'vulnerable' areas, it's a pretty good idea to rebuild those indexes after applying the HF.

Alternatives may take longer to verify and not doing so will lead to having to get your DR notes out to recover,

bearing in mind that this corruption will be propagated through AI notes ..

Posted by e.schutten on 22-May-2017 09:05

Thanks for the explanation.

Posted by Ruanne Cluer on 24-May-2017 02:56

Tom provided a nice prostrct stats parsing ABL. Take a look at the following Article which uses VSTs to accomplish the same, if you restricted this to only areas with indexes:

FOR EACH _Area WHERE _Area-number GT 3,

FIRST _areastatus WHERE _Areastatus-areanum = _Area-number ,

FIRST _storageobject OF _area WHERE _object-type = 2:

000081576 - How to find an Area's RPB and Active Blocks from VSTs?

knowledgebase.progress.com/.../VST-RPB-HWM-Active-Blocks

Posted by gus bjorklund on 24-May-2017 06:34

ruanne,

would running index compress to successful completion tell us that a suspect index is not corrupted?

of course, if /were/ corrupted, index compress might crash.

Posted by Ruanne Cluer on 06-Jun-2017 07:09

gus,

nice idea.

problem with this use-case is that the db is already jolly large and there's typically too little time to have test baselines

Posted by George Potemkin on 06-Jun-2017 07:50

> idxfix will not be able to find this corruption as it it does not traverse blocks like idxcheck does (even when the HF is applied).

I wonder why there is a difference between idxfix and idxcheck regarding the algorithms they use to find the index corruptions? Why idxfix can't do something that idxcheck can do or vice versa?

> idxcheck only with validation Option 3 (Validate record for each key) will find 32-bit 64-bit corruption within a block split

Option 3 (Both 1 and 2 above) with "Validate recids for index entries" exists only in idxfix, doesn't it?

Posted by gus bjorklund on 06-Jun-2017 09:14

Validate record for each key and Validate recids for index entries are the same thing.

This thread is closed