overhead of multiple unique indexes

Posted by richard thompson on 18-Jul-2016 09:33

During a row insert OE will validate that all index flagged as unique are in fact unique and throw an error if not.  No problem with that.  Likewise, during an update to a field that is a component of a unique index the new value is validated to ensure the resulting index will remain unique; again no problem with that.

My question is, is OE aware of multiple indexes defined as unique that contain the same fields?  Does the OE validation short circuit the validation process or does it check all defined unique indexes regardless of their structure.

Please allow me to provide an example:

Table has three unique indexes define index-A, index-B, and index-C comprised of the following fields:

Index-A, id, name, ssn

Index-B name, id, ssn

and

index-C ssn, cardID, bloodType

Since index-A and index-B contain the same fields does OE go ahead and validate them both or does it understand that the simple change in order has no affect on the uniqueness?

The reason I ask is because if my schema is filled with hundreds of these nonsense unique flags and OE must verify each one during an insert or update then I need to be concerned about the needless overhead.  However, as some have suggested, OE recognizes the silliness of my schema and only validates the necessary ones then I can stop being concerned about this (what appears to be) design error.

  Your observations and comments are welcome.

Richard

All Replies

Posted by gus bjorklund on 18-Jul-2016 10:22

Both indexes must be validated independently. Furthermore, the validation is done by actually creating the index entries so if the validation succeeds, then you are done.

Posted by George Potemkin on 18-Jul-2016 10:24

> OE recognizes the silliness of my schema and only validates the necessary ones

It's not correct.

> if my schema is filled with hundreds of these nonsense unique flags and OE must verify each one during an insert or update then I need to be concerned about the needless overhead.

Main overhead of the unique indexes is the "index entry locks". They will be maintained separately in each unique index.

Posted by richard thompson on 18-Jul-2016 10:50

Members, thanks.  

This is what I suspected.  If flagging the additional indexes as unique provides no value added then the overhead of validating the uniqueness of the index is simply wasted effort.  If index A is set to be unique on field ssn then any other index that contains the field ssn as a component of the index is also unique.  If there can only be one record with the ssn "123456789" then the index comprised of name, age, bloodtype, teaPreference, and ssn will also be unique.  Asking OE to verify that ssn is unique and then asking it to validate that name + bloodtype + teaPreference + ssn is also unique is a waist of time.  

I see no value in flagging the second index as unique.

Richard

Posted by Aidan Jeffery on 18-Jul-2016 11:05

>  If there can only be one record with the ssn "123456789"...

Note that none of the three indexes you have specified guarantee the uniqueness of ssn. For that, you need to have a single-component index on ssn alone.

Posted by richard thompson on 18-Jul-2016 11:41

So right you are.  I mixed my examples.  The first example was with two indexes that contained the same three fields as index components but in different order.  The premise being that if one was unique, the other had to be unique also.  The final example was a little closer to home where we have in our application design many situations where we have a index on (for example) ssn as unique then we define a second index, also unique, on ssn and some other field.  In that example, the single element index on ssn insures it is unique and there by extrapolation, any other index that contains the ssn field as a component of the index, unique.

Thanks so much for all who responded.  It really help to get such high quality feedback when performing some level of sanity check on my advice/analysis.

Richard

Posted by guymills on 19-Jul-2016 04:32

Hi Richard

Just a question:

If your pattern is generally:

Index-A, ssn

Index-B name, ssn

index-C cardID, ssn

etc...

Is ssn just an internal key, which has no data relevance?  Or is it a field that the users want to see, and (sub-)order their results by?

I ask, because if it is just an internal ID, then what is the need to include this field in the other indexes?  i.e. why do you want these indexes to be unique? There are implications of doing so:

I'm not an expert (like Gus - who I believe designed the bulk of the OpenEdge DB engine), or George is.  But from what I understand, the index keys are a concatenation of the index fields, and also the index keys will be compressed.

So e.g. if you have 2 indexes:

idx1 (name) and idx2 (name, ssn)

And you have e.g. 4 records:

name            ssn                      recid (Progress internal pointer to record)

fred               100000001        1

fred               100000002        2

bob               100000003        3

bob               100000004        4

idx1 will be compressed to something like:

fred 1,2

bob 3,4

whereas idx2 can have no compression (because of your use of the unique key as the final entry of the index):

fred100000001  1

fred100000002  2

bob100000003  3

bob100000004  4

[I know it's not really like this, but just to give the general idea...]

I guess if you're not dealing with much data, then this isn't a problem, but if you're dealing with million of rows, you may free a considerable amount of space by not including the unique key in the index, and potentially (this is for others to say...) it might perform better...

Cheers,

Guy

Posted by guymills on 19-Jul-2016 04:39

Just to add, I should have put bob before fred in my example :-)  Or perhaps it was a descending index ;-)

Posted by gus bjorklund on 20-Jul-2016 09:24

A: validation

There is /no/ additional overhead required to validate a unique index entry. The way it works is this:

When you execute a CREATE statement, we make a copy of the table's template records and put it into the record buffer. If there are any today or now fields, we fill in those values. the record buffer now has all the initial field values. At this point, the record is not in the database.

Once your code as given values to all the fields that are the key components of a unique index (or the end of the assign statement that does so has been reached), we validate the unique key or keys

To do this, we send the record to the database and then create the requisite index entry or entries. at this point the index entry creation may fail due to a uniqueness error. if it doesn’t fail, then the index entry is complete and we can move on. the only difference for a nonunique index is that there can be no uniqueness error.

The only difference the validation makes is that we may have to send the record to the database earlier than if we didn’t need to validate. everything else is the /same/ with or without unique index key validation.

B: index compression

The compression is very much better than that. we use 3 types of compression in the indexes.

0) in the levels above the leaf, we remove redundant trailing bytes, keeping only as much of the index entry as we need to find the leftmost entry in the next lower level.

for example, if the index is on last name, the first entry might be “adams” followed by the rest of the a's and then there might be “barry". at the level above the leaf we might need only “a” and “b” instead of the full name.

1) we discard the leading part of an index entry when all or part of it is the same as the preceding entry, replacing it with a count of the number of bytes that were discarded.

for example, we might have the key values

fred100000001

fred100000002

bob100000003

bob100000004

after compression it will look like this

fred100000001

12,2    /* 12 bytes the same as previous entry, followed by a byte with value 2 */

bob100000003

11,4    /* 11 bytes the same as previous entry, followed by a byte with value 3 */

2) when the key is nonunique, there may be multiple rowids for the same key value. If so, we can retain a single copy of the key value and then list the rowids associated with the key. we also compress the rowid list. we can’t do that for a unique index since there will be only one rowid per entry.

This thread is closed