Index corruption bulletin

Posted by ChUIMonster on 17-May-2017 07:51

I've been double checking systems after the index corruption bulletin came out:

knowledgebase.progress.com/.../Critical-Alert-Index-manager-defect-can-corrupt-large-indexes-where-dbkeys-straddle-32-64-bit-boundary

If I am correctly interpreting the bulletin it seems like it is fair to say that a good *quick* check is to run:

    prostrct statistics dbname | grep "Active blocks:" | sort -nb -k3

The "worst case" is a storage area with 256 rows per block.  In that case 8,388,608 blocks is the limit (2^31 / 256).  So if *none* of your storage areas are larger than that you're fine and you don't need to do a detailed area by area check.

And for more detailed checks the following table could be used:

RPB     Max Active Blocks (in a storage area containing indexes)
===     =================
256             8,388,607
128             16,77,215
 64            33,554,431
 32            67,108,863
 16           134,217,727
  8           268,435,455
  4           536,870,911
  2         1,073,741,823
  1         2,147,483,647

Does this match other people's understanding?

If the table above is correct then another "quick" filter is that if your 4k db is less than 32GB or your 8k db is less than 64GB you need not bother checking.

Posted by Rob Fitzpatrick on 17-May-2017 08:43

> Does this match other people's understanding?

Yes.

Posted by gus bjorklund on 17-May-2017 08:47

yes, that is a good, quick way to check.

the table is almost correct. i’m just nitpicking. there is no block zero. the first block is nr. 1.

Posted by mfurgal on 17-May-2017 08:52

Tom:

Excellent job providing this table.  Useful information as usual.  Your understanding is correct as far as I can tell.  We did a similar study with our MDBA customers and informed each or them whether they were at risk or not based on a similar algorithm.  As you would suspect having a single 32 GB or 64 GB index area is pretty rare.  I think we all, as a community,  did a good job with storage area design back when Progress V9 came out to not have a single area called “Data” with a single associated area called “Index”.

The remedy is 2 fold.
1.) Upgrade to a version of OpenEdge that has this issue fixed (preferred)
2.) If you have a large index area, split it into multiple index areas, which requires downtime to truncate the area, index-move, and index rebuild the areas.

Now is the time to make sure you have no indexes in any storage area that contains table data.

Mike
-- 
Mike Furgal
Director – Database and Pro2 Services
PROGRESS Bravepoint
678-225-6331 (office)
617-803-2870 (cell)


All Replies

Posted by cjbrandt on 17-May-2017 08:35

Good reason to make sure tables and indexes are in separate areas.  

Posted by Rob Fitzpatrick on 17-May-2017 08:43

> Does this match other people's understanding?

Yes.

Posted by gus bjorklund on 17-May-2017 08:47

yes, that is a good, quick way to check.

the table is almost correct. i’m just nitpicking. there is no block zero. the first block is nr. 1.

Posted by mfurgal on 17-May-2017 08:52

Tom:

Excellent job providing this table.  Useful information as usual.  Your understanding is correct as far as I can tell.  We did a similar study with our MDBA customers and informed each or them whether they were at risk or not based on a similar algorithm.  As you would suspect having a single 32 GB or 64 GB index area is pretty rare.  I think we all, as a community,  did a good job with storage area design back when Progress V9 came out to not have a single area called “Data” with a single associated area called “Index”.

The remedy is 2 fold.
1.) Upgrade to a version of OpenEdge that has this issue fixed (preferred)
2.) If you have a large index area, split it into multiple index areas, which requires downtime to truncate the area, index-move, and index rebuild the areas.

Now is the time to make sure you have no indexes in any storage area that contains table data.

Mike
-- 
Mike Furgal
Director – Database and Pro2 Services
PROGRESS Bravepoint
678-225-6331 (office)
617-803-2870 (cell)


Posted by ChUIMonster on 17-May-2017 08:54

It also makes a good, albeit unfortunate, argument for RPB = 1 in index areas :)

Posted by ChUIMonster on 17-May-2017 09:03

Gus, I fixed the table for you :)

Posted by ChUIMonster on 17-May-2017 09:13

A quick and dirty dot-p to munge "prostrct statistics" into an easier to review format:

/* stinfo.p
 *
 * output assumes that your window is at least 132 columns wide...
 *
 * pro -p stinfo.p -param /db/dbname
 *
 *
 * prostrct statistics dbname
 * ...
 * 
 *   Statistics for Area: Schema Area
 * 
 *   Files in Area: Schema Area
 * ...
 *   Database Block Usage for Area: Schema Area
 * 
 *   Active blocks: 22618
 *     Data blocks: 11437
 *     Free blocks: 11181
 *    Empty blocks: 6
 *    Total blocks: 22624
 *   Extent blocks: 4
 *   Records/Block: 64
 *    Cluster size: 1
 *  ...
 * 
 */

define variable lineIn as character no-undo.

define variable areaName   as character no-undo format "x(30)" label "Area Name".
define variable areaRPB    as integer   no-undo format   ">>9" label "RPB".
define variable areaCSZ    as integer   no-undo format   ">>9" label "CSZ".
define variable areaActive as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Active".
define variable areaData   as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Data".
define variable areaFree   as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Free".
define variable areaEmpty  as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Empty".
define variable areaTotal  as int64     no-undo format ">,>>>,>>>,>>>,>>9" label "Total".

input through value( "prostrct statistics " + session:parameter ).
repeat:

  import unformatted lineIn.

  lineIn = trim( lineIn ).

  if lineIn begins "Statistics for Area:" then
    assign
      areaName   = trim( entry( 2, lineIn, ":" ))
      areaRPB    = 0
      areaCSZ    = 0
      areaActive = 0
      areaData   = 0
      areaFree   = 0
      areaEmpty  = 0
      areaTotal  = 0
    .

  if lineIn begins "Active blocks:" then areaActive =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins   "Data blocks:" then areaData   =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins   "Free blocks:" then areaFree   =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins  "Empty blocks:" then areaEmpty  =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins  "Total blocks:" then areaTotal  =   int64( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins "Records/Block:" then areaRPB    = integer( trim( entry( 2, lineIn, ":" ))).
  if lineIn begins  "Cluster size:" then areaCSZ    = integer( trim( entry( 2, lineIn, ":" ))).

  if lineIn begins "Cluster size:" /* and areaName matches "*idx*" */ then
    do:
      display
        areaName
        areaRPB
        areaCSZ
        areaActive
        areaData
        areaFree
        areaEmpty
        areaTotal
      .
    end.

end.

pause.

return.

Posted by mfurgal on 17-May-2017 10:00

I’m not buying that reason.  RPB = 1 in general can lead to mistakes that are costly.  Here’s some metrics from our MDBA program and this issue (potential).

Total # databases including test databases: 2,876
Total # Areas in these databases: 31,522
Total # databases that have a suspect storage area (based on RPB): 42
Total # Areas that are suspect (based on RPB): 55
Total # Areas that contain indexes that are suspect: 7

Mike
-- 
Mike Furgal
Director – Database and Pro2 Services
PROGRESS Bravepoint
678-225-6331 (office)
617-803-2870 (cell)


Posted by ChUIMonster on 17-May-2017 10:25

I don't know why you wouldn't buy that reason -- it is self evident that it helps with this particular issue.  True, it is just one lonely bullet point against the rather larger list of reasons not to do it.

I do understand that, by itself and without any "large index" problem that needs attention (and for some reason that cannot be addressed via hotfix or upgrade...), it isn't much of a reason to make that choice for general purpose index areas.  I'm not suggesting or encouraging that everyone do so.  I'm just noting that there is an interesting, albeit unusual and unfortunate, case where it actually has a clear benefit.

I know -- people read this stuff, extract a sound bite without bothering to understand the context and then do the wrong thing.  But black and white rules without nuance can be just as problematic.

Posted by ctoman on 17-May-2017 10:36

- Nice job Tom. results :

Area Name                      RPB CSZ            Active              Data              Free             Empty             Total

────────────────────────────── ─── ─── ───────────────── ───────────────── ───────────────── ───────────────── ─────────────────

Control Area                    64   1                10                10                 0                70                80

Schema Area                     64   1             1,938             1,938                 0                46             1,984

oeel32                          32  64         2,475,071         2,475,014                57         2,595,073         5,070,144

arett64                         64  64         1,096,831         1,096,774                57         1,184,769         2,281,600

glet64                          64  64         1,181,567         1,181,510                57         1,607,041         2,788,608

icet64                          64  64           899,775           899,718                57         1,635,329         2,535,104

vaesl64                         64  64         2,349,311         2,349,254                57           185,793         2,535,104

event_trans_sub64               64  64         2,710,335         2,710,278                57           331,777         3,042,112

Misc8                            8  64           486,975           486,918                57           273,601           760,576

Misc16                          16  64           948,223           948,166                57         1,333,377         2,281,600

Misc32                          32  64         6,489,535         6,489,478                57         1,876,161         8,365,696

Misc64                          64  64         4,957,439         4,957,382                57         2,140,737         7,098,176

Misc128                        128  64         2,713,279         2,713,222                57         1,342,849         4,056,128

Misc256                        256  64             9,791             9,734                57           243,777           253,568

Tableszerorecords              128  64            27,519            27,463                56           226,049           253,568

AuditData                       32  64                63                 6                57           253,505           253,568

Lobs                            64  64         1,284,863         1,284,806                57           743,233         2,028,096

oeel_Index32                     1   8           326,231           326,230                 1           717,641         1,043,872

arett_Index64                    1   8           151,559           151,558                 1           355,465           507,024

glet_Index64                     1   8           154,215           154,214                 1           352,809           507,024

icet_Index64                     1   8           149,759           149,758                 1           357,265           507,024

vaesl_Index64                    1   8           416,583           416,582                 1           597,449         1,014,032

event_trans_sub_Index64          1   8           175,407           175,406                 1            78,113           253,520

MiscIndex8                       1   8            55,039            55,038                 1           200,977           256,016

MiscIndex16                      1   8           194,263           194,262                 1           317,753           512,016

MiscIndex32                      1   8         1,215,695         1,215,694                 1         1,068,353         2,284,048

MiscIndex64                      1   8         1,165,879         1,165,879                 0         1,118,169         2,284,048

MiscIndex128                     1   8           859,223           859,223                 0           917,817         1,777,040

MiscIndex256                     1   8             4,567             4,566                 1            11,449            16,016

AuditIndex                       1   8                 7                 6                 1           125,033           125,040

LobsIndex                        1   8                 7                 6                 1           256,009           256,016

ReplQueueArea                  128 512            20,991            20,486               505           107,521           128,512

ReplCtrlArea                   256   8                23                22                 1           127,993           128,016

ReplMapArea                    256   8               103               102                 1           127,913           128,016

ReplQueueIdx                   256 512             7,679             7,174               505           120,833           128,512

ReplCtrlIdx                    256   8                23                22                 1           127,993           128,016

ReplMapIdx                     256  64               511               454                57           127,553           128,064

Posted by mikaelstenmark on 18-May-2017 08:27

Just to be on the safe side. I guess this issue only applies to those areas containing index so those areas containing only data are not affected by this issue.

Posted by ChUIMonster on 18-May-2017 08:32

That is correct.

Posted by e.schutten on 18-May-2017 09:40

We have an area that has more than 33,554,431 active blocks (64 RPB). In this area there is no index corruption, so installing hotfix 10.2B 64 is the only thing we shoud do, correct? Or should we also do an index rebuild/fix?

Posted by mfurgal on 18-May-2017 10:24

The bulletin and k-base article suggest that you may have index corruption but tools like idxfix will not report it.


These 3 points.

IDXFIX is unable to locate the index corruption.
IDXCHECK Option 3 can detect the index corruption, but can crash in OpenEdge versions prior to 11.7 due to the index corruption.
Index corruption seems to reoccur relatively quickly after an IDXBUILD has been done to address the index corruption issue.

The article also suggest that once the hotfix is applied that you need to rebuilt your problem indexes.  Given that you have not seen any index issues, you do not know which indexes to rebuild.  The safest and best bet would be to rebuild all the indexes in that area.

Mike
-- 
Mike Furgal
Director – Database and Pro2 Services
PROGRESS Bravepoint
678-225-6331 (office)
617-803-2870 (cell)


Posted by e.schutten on 19-May-2017 04:08

We have an area in the database that exeeds 33,554,431 active blocks, but we don't have index problems as far as we now. Is only installing the hotfix enough? Or should we do also an index rebuild?

Posted by Marie Candela on 19-May-2017 08:18

How did you determine there is no index corruption?
 

Posted by ChUIMonster on 19-May-2017 09:59

If your area is that large *and* there are indexes in that area *and* the rows per block is 64 or greater then you *might*  have index corruption.

You might not have seen any symptoms.

The safe thing to do is to apply the hotfix  (or upgrade) and rebuild all indexes in that area.  Merely installing the hotfix is not enough.

Posted by e.schutten on 22-May-2017 05:06

But what does the hotfix actually fix? I assume that when I have installed the hotfix, that I can use the IDXCHECK and IDXFIX, without any problems. So I can use IDXFIX when an index corruption occurs. Why should I rebuild the indexes after installing the hotfix?

Posted by Ruanne Cluer on 22-May-2017 05:53

The HF prevents future corruption should a new index block be needed for a multi-level index and the dbkey crosses the 32/64-bit boundary

If there is existing index corruption, it needs to be fixed

idxfix will not be able to find this corruption as it it does not traverse blocks like idxcheck does (even when the HF is applied).

idxcheck only with validation Option 3 (Validate record for each key) will find 32-bit 64-bit corruption within a block split, but:

- it can take a long time (even when the HF is applied).

- it can cause a dbdown (before the HF is applied).

Hence, if you have 'vulnerable' areas, it's a pretty good idea to rebuild those indexes after applying the HF.

Alternatives may take longer to verify and not doing so will lead to having to get your DR notes out to recover,

bearing in mind that this corruption will be propagated through AI notes ..

Posted by e.schutten on 22-May-2017 09:05

Thanks for the explanation.

Posted by Ruanne Cluer on 24-May-2017 02:56

Tom provided a nice prostrct stats parsing ABL. Take a look at the following Article which uses VSTs to accomplish the same, if you restricted this to only areas with indexes:

FOR EACH _Area WHERE _Area-number GT 3,  

FIRST _areastatus WHERE _Areastatus-areanum = _Area-number ,

FIRST _storageobject OF _area WHERE _object-type = 2:

000081576 - How to find an Area's RPB and Active Blocks from VSTs?

knowledgebase.progress.com/.../VST-RPB-HWM-Active-Blocks

Posted by gus bjorklund on 24-May-2017 06:34

ruanne,

would running index compress to successful completion tell us that a suspect index is not corrupted?

of course, if /were/ corrupted, index compress might crash.

Posted by Ruanne Cluer on 06-Jun-2017 07:09

gus,

nice idea.

problem with this use-case is that the db is already jolly large and there's typically too little time to have test baselines

Posted by George Potemkin on 06-Jun-2017 07:50

> idxfix will not be able to find this corruption as it it does not traverse blocks like idxcheck does (even when the HF is applied).

I wonder why there is a difference between idxfix and idxcheck regarding the algorithms they use to find the index corruptions? Why idxfix can't do something that idxcheck can do or vice versa?

> idxcheck only with validation Option 3 (Validate record for each key) will find 32-bit 64-bit corruption within a block split

Option 3 (Both 1 and 2 above) with "Validate recids for index entries" exists only in idxfix, doesn't it?

Posted by gus bjorklund on 06-Jun-2017 09:14

Validate record for each key and Validate recids for index entries are the same thing.

This thread is closed