Hi,
I came across an interesting fact: the totla size of the local index after splitting significantly less than when the index was global.
For example,
This is global index InfField before splitting:
INDEX BLOCK SUMMARY FOR SHARED OBJECTS: -------------------------------------------- Table Index Fields Levels Blocks Size % Util Factor PUB.theTable DateField 9 1 3 51286 360.2M 90.2 1.2 IntField 10 1 3 26687 107.7M 51.9 2.0 <------------------------ Rand64 11 1 3 225731 1.1G 64.1 1.7 SeqId64 8 1 3 187607 776.4M 53.2 1.9 ------------------------------------------------------------------ Subtotals: 491311 2.3G 62.0 1.7
This index is built by one field (INTEGER), which has a value from 0 to 10.
And as we can see this index has 26 687 blocks and 107.7 Mb.
After splitting into 11 partition (list partitioning by IntField) we have:
INDEX BLOCK SUMMARY FOR SHARED OBJECTS: -------------------------------------------- Table Index Fields Levels Blocks Size % Util Factor PUB.test idx1 12 1 1 1 3.0B 0.0 1.0 PUB.theTable IntField _Partition-Internal-Value -1558 1 1 1 227.0B 2.8 1.0 DateField:0 9 1 3 93056 361.2M 49.9 2.0 IntField 10 1 IntField-1:3 2 215 1.6M 98.2 1.0 IntField-10:2 2 215 1.6M 98.2 1.0 IntField-11:1 2 216 1.6M 97.8 1.0 IntField-2:4 2 215 1.6M 98.2 1.0 IntField-3:5 2 215 1.6M 98.2 1.0 IntField-4:6 2 215 1.6M 98.2 1.0 IntField-5:7 2 215 1.6M 98.2 1.0 IntField-6:8 2 215 1.6M 98.2 1.0 IntField-7:9 2 215 1.6M 98.2 1.0 IntField-8:10 2 215 1.6M 98.2 1.0 IntField-9:11 2 215 1.6M 98.3 1.0 Rand64:0 11 1 3 260801 1.4G 68.2 1.6 SeqId64:0 8 1 3 204776 1.0G 64.9 1.7 ------------------------------------------------------------------ Subtotals: 560786 2.7G 64.0 1.7
As we can see we got 11 local indexes (IntField-1, IntField-2, ..., IntField-11), but each of them have only 215 blocks and 1.6 мб size.
215 * 11 = 2 365 total blocks (!)
1.6 * 11 = 17.6 total Mb (!)
It the magic of compression, really!
How can this be?!?
Regards,
Valeriy
Before the partitioning the keys of the IntField index used on disk 1.25 bytes per key (= 107.5M / 90000700 records). And by the way, the worse index (Rand64 with 64-bit random value) used 13.1 bytes per key (=1.1G / 90000700 records) - the key size without compression would be of the same 13 bytes (= 8 bytes for key value + 1 byte for a separator + 4 bytes for RECID).
After the partitioning the keys of the IntField index use 0.2 bytes per key (=1.6M / 8185395) or 1.64 bits per key.
IntField index was used for the splitting. So in new location RECIDs and the keys are now in sync and the compression algorithm works equally well for both key values and RECID parts of the index keys. You would get the same result after D&L of the original database using IntField index.
It's turned out that Progress creates the local indexes using the smarter way! The leading index component has the same value for all records in the same section. It seems that Progress virtually exclude the leading component from the definition of local indexes and the related part of index keys is not stored in the index tree.
I wrote: "the compression algorithm works equally well for both key values and RECID parts of the index keys". I was wrong: in this case the local indexes are in fact an equivalent of the the "default" index created by Progress using the "phantom" field when we don't define any indexes for a table. In other words they don't have the index keys and index tree is just a sorted list of recid's. I'm not sure how much space it will safe for an index but definitely an index will use the minimum space.
Hmm...
Someone from Progress can comment on this finding?
That would make sense for single value partitions (cust-num = 1 for example). I know that Oracle and SQL Server do similar things with local (partitioned) indexes.
At the very least a freshly built index with only a few distinct partition keys "should" be smaller than similar ones that have grown over time.
Do we need such beautifully small indexes at all? I mean the indexes with a single partition aligned key component. They store the sorted list of recids. In other words it's just the list of the blocks in the partition. These indexes do not give us any benefits compared to a block by block reading.
I assume you are talking about a local index with only the partition key (1 column index). Oracle or SQL Server would almost never use that index unless you were doing a record count anyways :-)
I assume that when you have multiple fields in the local index the structures would make much more sense.
also, there are 3 different kinds of "compression" used in the openedge rdbms index b-trees. these have been there for a long time.
0) in the levels above the leaf level, trailing key bytes are dropped, keeping only enough to distinguish key values for blocks at the next lower level
1) at the leaf level, duplicate leading key bytes are dropped except for the first entry, and replaced by a number that says how many bytes were discarded because the are the same as an earlier entry.
3) for non-unique indexes, at the leaf level, when there are multiple entries for the same key, the rowid's are compressed by discarding leading duplicate bytes.
> 0) in the levels above the leaf level, trailing key bytes are dropped, keeping only enough to distinguish key values for blocks at the next lower level
Now I see why binary dump might start less threads than the number of index entries stored in the root block.
Thanks, Gus!
Are there any changes in multi-index search regarding a table partitioning?
Let's say my query has used 'where field1 eq "a" and field2 eq "b"'. If there are the single component indexes on both fields then the query will use both these indexes.
Now the field1 was used as a partition key. Can we drop the index on field1?
Will the query understand that it should return only records from the field1_a partition?
Nothing that I could find in my testing or a quick review of the documentation.
As far as I know the global index structures have not changed because of partitioning. The only way it could work is if Progress decided to evaluate the ROWIDs to determine which partition they belonged to. Which would still mean you would read all of the index blocks for field2 eq "b".
Your example just screams local index on field1,field2 though.
Once Gus gets done writing that 4GL cost based optimizer it could decide to partition scan based on the distinct values of field2 in that partition though ;-)