Table Partitioning: size of the local index after splitting

Posted by Valeriy Bashkatov on 22-Jun-2015 05:00

Hi,

I came across an interesting fact: the totla size of the local index after splitting significantly less than when the index was global.

For example,

This is global index InfField before splitting:

INDEX BLOCK SUMMARY FOR SHARED OBJECTS:
--------------------------------------------
Table                      Index  Fields Levels         Blocks    Size  % Util  Factor
PUB.theTable
  DateField                    9       1      3          51286  360.2M    90.2     1.2
  IntField                    10       1      3          26687  107.7M    51.9     2.0 <------------------------
  Rand64                      11       1      3         225731    1.1G    64.1     1.7
  SeqId64                      8       1      3         187607  776.4M    53.2     1.9

                    ------------------------------------------------------------------
Subtotals:                                              491311    2.3G    62.0     1.7

This index is built by one field (INTEGER), which has a value from 0 to 10.

And as we can see this index has 26 687 blocks and 107.7 Mb.

After splitting into 11 partition (list partitioning by IntField) we have:

INDEX BLOCK SUMMARY FOR SHARED OBJECTS:
--------------------------------------------
Table                      Index  Fields Levels         Blocks    Size  % Util  Factor
PUB.test
  idx1                        12       1      1              1    3.0B     0.0     1.0
PUB.theTable
  IntField
    _Partition-Internal-Value
                           -1558       1      1              1  227.0B     2.8     1.0
    DateField:0                9       1      3          93056  361.2M    49.9     2.0
    IntField                  10       1                                                                 
      IntField-1:3                            2            215    1.6M    98.2     1.0     
      IntField-10:2                           2            215    1.6M    98.2     1.0    
      IntField-11:1                           2            216    1.6M    97.8     1.0
      IntField-2:4                            2            215    1.6M    98.2     1.0
      IntField-3:5                            2            215    1.6M    98.2     1.0
      IntField-4:6                            2            215    1.6M    98.2     1.0
      IntField-5:7                            2            215    1.6M    98.2     1.0
      IntField-6:8                            2            215    1.6M    98.2     1.0
      IntField-7:9                            2            215    1.6M    98.2     1.0
      IntField-8:10                           2            215    1.6M    98.2     1.0
      IntField-9:11                           2            215    1.6M    98.3     1.0

    Rand64:0                  11       1      3         260801    1.4G    68.2     1.6
    SeqId64:0                  8       1      3         204776    1.0G    64.9     1.7

                     ------------------------------------------------------------------
Subtotals:                                              560786    2.7G    64.0     1.7

As we can see we got 11 local indexes (IntField-1, IntField-2, ..., IntField-11), but each of them have only 215 blocks and 1.6 мб size.

215 * 11 = 2 365 total blocks (!)

1.6 * 11 = 17.6 total Mb (!)

It the magic of compression, really!
How can this be?!?

Regards,
Valeriy

OpenEdge RDBMS - Forum

All Replies

Posted by George Potemkin on 22-Jun-2015 08:08

Before the partitioning the keys of the IntField index used on disk 1.25 bytes per key (= 107.5M / 90000700 records). And by the way, the worse index (Rand64 with 64-bit random value) used 13.1 bytes per key (=1.1G / 90000700 records) - the key size without compression would be of the same 13 bytes (= 8 bytes for key value + 1 byte for a separator + 4 bytes for RECID).

After the partitioning the keys of the IntField index use 0.2 bytes per key (=1.6M / 8185395) or 1.64 bits per key.

IntField index was used for the splitting. So in new location RECIDs and the keys are now in sync and the compression algorithm works equally well for both key values and RECID parts of the index keys. You would get the same result after D&L of the original database using IntField index.

Posted by George Potemkin on 23-Jun-2015 04:08

It's turned out that Progress creates the local indexes using the smarter way! The leading index component has the same value for all records in the same section. It seems that Progress virtually exclude the leading component from the definition of local indexes and the related part of index keys is not stored in the index tree.

I wrote: "the compression algorithm works equally well for both key values and RECID parts of the index keys". I was wrong: in this case the local indexes are in fact an equivalent of the the "default" index created by Progress using the "phantom" field when we don't define any indexes for a table. In other words they don't have the index keys and index tree is just a sorted list of recid's. I'm not sure how much space it will safe for an index but definitely an index will use the minimum space.

Posted by Valeriy Bashkatov on 23-Jun-2015 04:14

Hmm...

Someone from Progress can comment on this finding?

Posted by TheMadDBA on 23-Jun-2015 08:24

That would make sense for single value partitions (cust-num = 1 for example). I know that Oracle and SQL Server do similar things with local (partitioned) indexes.

At the very least a freshly built index with only a few distinct partition keys "should" be smaller than similar ones that have grown over time.

Posted by George Potemkin on 23-Jun-2015 08:48

Do we need such beautifully small indexes at all? I mean the indexes with a single partition aligned key component. They store the sorted list of recids. In other words it's just the list of the blocks in the partition. These indexes do not give us any benefits compared to a block by block reading.

Posted by TheMadDBA on 23-Jun-2015 09:01

I assume you are talking about a local index with only the partition key (1 column index). Oracle or SQL Server would almost never use that index unless you were doing a record count anyways :-)

I assume that when you have multiple fields in the local index the structures would make much more sense.

Posted by gus on 23-Jun-2015 09:07

also, there are 3 different kinds of "compression" used in the openedge rdbms index b-trees. these have been there for a long time.

0) in the levels above the leaf level, trailing key bytes are dropped, keeping only enough to distinguish key values for blocks at the next lower level

1) at the leaf level, duplicate leading key bytes are dropped except for the first entry, and replaced by a number that says how many bytes were discarded because the are the same as an earlier entry.

3) for non-unique indexes, at the leaf level, when there are multiple entries for the same key, the rowid's are compressed by discarding leading duplicate bytes.

Posted by George Potemkin on 23-Jun-2015 09:38

> 0) in the levels above the leaf level, trailing key bytes are dropped, keeping only enough to distinguish key values for blocks at the next lower level

Now I see why binary dump might start less threads than the number of index entries stored in the root block.

Thanks, Gus!

Posted by George Potemkin on 23-Jun-2015 09:57

Are there any changes in multi-index search regarding a table partitioning?

Let's say my query has used 'where field1 eq "a" and field2 eq "b"'. If there are the single component indexes on both fields then the query will use both these indexes.

Now the field1 was used as a partition key. Can we drop the index on field1?

Will the query understand that it should return only records from the field1_a partition?

Posted by TheMadDBA on 23-Jun-2015 11:10

Nothing that I could find in my testing or a quick review of the documentation.

As far as I know the global index structures have not changed because of partitioning. The only way it could work is if Progress decided to evaluate the ROWIDs to determine which partition they belonged to. Which would still mean you would read all of the index blocks for field2 eq "b".

Your example just screams local index on field1,field2 though.

Once Gus gets done writing that 4GL cost based optimizer it could decide to partition scan based on the distinct values of field2 in that partition though ;-)

This thread is closed