Index performance for a big table

Posted by Tim Kuehn on 16-Dec-2016 09:34

I've got a table with 3 fields - field1, field2, and field3 with ~3B records in it. 

The data pattern is: 

  • field1 - highly repetitive with 10 possible values max,
  • field2 - one of a couple hundred values, 
  • field3 - should be mostly unique with a few million records that have duplicate entries in the table

I'm running a simple 

can-find(first tablename

             where tablename.field1 = 'somevalue' and

                     tablename.field2 = 'othervalue')

to see if certain data is in the table, and it's taking a terribly long time to run.

Update:

  • There are three indexes on this table,
  • The compiler appears to be choosing the one in field1, field2, field3 order. 

promon is showing a lot of db reads (I'm guessing because its doing index reads) and that's about it.

My questions are - is this to be expected, and is there anything I can do about it? 

Posted by ChUIMonster on 16-Dec-2016 11:15

You can do an idxcompact online

Posted by ChUIMonster on 16-Dec-2016 10:20

Do you have dbanalys available?  Perhaps the index suffers from (very) poor utilization.  One possible source might be a large purge at some point.

Or perhaps that data pattern is not what you describe.

Or maybe there are unknown values in one or more of those key fields?

All Replies

Posted by Garry Hall on 16-Dec-2016 09:43

What indices do you have on tablename?

Posted by Tim Kuehn on 16-Dec-2016 09:47

The index the compiler is choosing is in field1, field2, field3 order.

Posted by Garry Hall on 16-Dec-2016 10:19

If you have an index with field2, field1, can you specify USE-INDEX on the CAN-FIND? I am guessing the chosen index is either the primary index, or all the other indices use field3 as the first or second component.

Posted by ChUIMonster on 16-Dec-2016 10:20

Do you have dbanalys available?  Perhaps the index suffers from (very) poor utilization.  One possible source might be a large purge at some point.

Or perhaps that data pattern is not what you describe.

Or maybe there are unknown values in one or more of those key fields?

Posted by Tim Kuehn on 16-Dec-2016 10:45

One possible source might be a large purge at some point.

SMH - I'd deleted ~400M records from this table at one point....I think I have my answer. 

Thx!

Posted by ChUIMonster on 16-Dec-2016 11:15

You can do an idxcompact online

Posted by Eric Andruscavage on 16-Dec-2016 12:13

I am always curious when I see a unique or nearly unique field at the end of an index. It can cause the index to be multiple times the size of an index that doesn't contain the field and adds almost no performance to the index.

A lot depends on the data, the order the data was added/deleted/updated, and the distribution of data in the index.

You don't say what 'a terribly long time' is, but if you mean more than one second then you have a hardware fault or not enough ram or some other problem.

If it's been awhile since the last index rebuild the index tree will be unbalanced and the query may need to traverse many more levels to get to the index block that contains the values you are looking for. The index block space utilization can be low if there are updates to any of the 3 fields.

Run the IDXCOMPACT option of proutil.

I'm not the expert here, but I believe that even after an index rebuild the index to your table would contain many more levels than an index without field3.

I would suggest adding an index that only contained field1 and field2.

Posted by Tim Kuehn on 19-Dec-2016 09:45

[quote user="ChUIMonster"]

Perhaps the index suffers from (very) poor utilization.  One possible source might be a large purge at some point.

[/quote]

The table had had a large purge of ~400M records at one point. An idxcompact on the index in question solved the problem - the lookup time went from ~5 min to an instant response. 

Posted by George Potemkin on 19-Dec-2016 10:03

> promon is showing a lot of db reads (I'm guessing because its doing index reads) and that's about it.

For the non-unique indexes can-find() always reads exactly <n> index blocks and 1 index key where <n> is the number of the levels in the index tree. This value used to be less or equal 5-6 for very large tables.

For the unique indexes can-find() may read abnormally large number of index blocks and index keys because it will read and skip the "index entry locks" left by data purge. So activity caused by can-find() is a good probe test to decide if it's a time to run idxcompact against the unique indexes. All needed data are available from VSTs.

Poor utilization of index blocks is NOT a factor that affect can-find().

This thread is closed