Table partitioning sanity check

Posted by cverbiest on 11-May-2015 08:01

We've never used table partitioning before and I'd like to check if what I'm proposing makes sense. Sorry about the longish post/question.

I'm trying to find a balance between performance gain and extra database maintenance.

Current db structure

Our database is currently split up in 9 generic area's. Tables are assigned to a group, BASE, TRANS or AUDIT. We have 3 area's per group, one for data, one for indexes and one for lobs.

# Schema
d "Schema Area":6,64;1 .
# BASE = setup tables, read frequent, update less frequently (e.g. customer, item, ...)
d "BASE tables":10,64;8 .
d "BASE indexes":11,1;8 .
d "BASE lobs":12,1;8 .
# Transactional tables, high growth / update activity (e.g. Order, Orderline, ...)
d "TRANS tables":20,64;64 .
d "TRANS indexes":21,1;64 .
d "TRANS lobs":22,1;64 .
# Audit, mainly write-frequent, query rarely  tables
d "AUDIT tables":30,64;64 .
d "AUDIT indexes":31,1;64 .
d "AUDIT lobs":32,1;64 .

Tables to focus on

I created an Excel report based on _IndexStat and _TableStat to identify the tables that are read most, and to show which index is used most of the time. Here I put my focus on the top 5 transactional tables, ignoring the base tables as those should be cached in memory anyway.

Changes to areas

All of the tables in my top 5 contain a numeric status field (10 = new, 20 = busy, ... , 60 = done, 99 = cancel)

I would start with adding a single area, "TRANS actual data".

  • status < 60 => data in "TRANS actual data"
  • status >= 60 => data in existing "TRANS tables" area

Does this make sense ?

A lot of the accesses to those tables do not use the status field as a query, but a unique id field. Does it matter that the most frequently used index does not contain the field that is used the split the data ? I think it doesn't but maybe I'm wrong here.

Some of the indexes on those tables have a field before the status field. Fortunately that field has a very limited number of values. Can we still use that index to partition the data if we provide more conditions or is that a bad idea ?

I'm also considering to introduce a new status value (90 = archived) to make the move of data between "actual and archive" explicit.

Posted by TheMadDBA on 11-May-2015 10:13

You are better off putting mostly static lookup tables into the alternate buffer pool. You can size the -B2 to hold the entire working set and they will stay there once loaded until the DB restarts. No LRU activity to evict and reload pages.

Leave the trans tables in the main buffer pool and it will keep as much of those tables as possible but it is possible to get it flushed out by huge queries. If you have enough memory to load everything that would be great but not usually the real world unless you have a smaller database.

If you find that the main buffer is being flushed out then you can use the VSTs and something like ProTop to try and find the offending code and make sure the index usage is proper. If you don't have ProTop yet I would strongly suggest to download it. It will show you pretty much all of the meaningful metrics that the vast majority of Progress installs would ever need to look at.

All Replies

Posted by TheMadDBA on 11-May-2015 09:32

Until Progress gets a cost based optimizer for the 4GL that will decide to do partition scans without indexes you are much better off partitioning for ease of maintenance and ignoring any performance gains. The physics of looking up a non partitioned record and a partitioned record are essentially the same at this point.... find the index block(s) for the rowid and then get the appropriate data block.

Partitioning for performance works best when you are doing non indexed scans and you eliminate most of the data by partition pruning. It works even better when you are doing non indexed joins between tables that have the same partition strategy and the join is happening in bulk on the server. Unfortunately this is not the case for the vast majority of 4GL code and the server side hash joins don't exist for the 4GL at all.

If you partition on a status code of some kind you can get improvements for keeping the "active" records in a smaller number of data blocks but you pay the penalty of moving those records around when the status changes. How much the net gain/loss is will depend on how often those status codes change and how many moves you do.

A good place to start would be to look at OLTP partitioning strategies for Oracle. The concepts are similar even if the execution is different.

You will probably get much better performance improvements by making sure you use the alternate buffer pool properly (for your base tables) and maybe look at splitting those TRANS areas into multiple areas for similar record sizes and usage patterns. And of course any code/index changes.

Posted by cverbiest on 11-May-2015 10:06

Thanks for your reply, I forgot to mention that I intend to put "Trans actual data" area in the alternate buffer pool

but knowledgebase.progress.com/.../000044293 states

What should not be allocated to the Alternate Buffer Pool (-B2) ?:

  •    Include database objects / areas that have a very broad range of record usage. For example, a table that users might run queries/tasks that cover many years of data or has high Create-Delete-Update activity.

 

The tables I'd put in there have high CRUD activity but I want all the actual data in memory,

Am I going in the wrong direction or is that PKB too strict ?

Posted by James Palmer on 11-May-2015 10:08

I could be wrong, but one big reason for not putting high CRUD tables into -B2 is that you want to size -B2 to be only just over the size of what you're putting in. A table with high CRUD could expand and blow the sizing of the buffer, meaning the LRU2 chain is activated and that is a performance hit.

Posted by TheMadDBA on 11-May-2015 10:13

You are better off putting mostly static lookup tables into the alternate buffer pool. You can size the -B2 to hold the entire working set and they will stay there once loaded until the DB restarts. No LRU activity to evict and reload pages.

Leave the trans tables in the main buffer pool and it will keep as much of those tables as possible but it is possible to get it flushed out by huge queries. If you have enough memory to load everything that would be great but not usually the real world unless you have a smaller database.

If you find that the main buffer is being flushed out then you can use the VSTs and something like ProTop to try and find the offending code and make sure the index usage is proper. If you don't have ProTop yet I would strongly suggest to download it. It will show you pretty much all of the meaningful metrics that the vast majority of Progress installs would ever need to look at.

Posted by George Potemkin on 12-May-2015 02:20

> Am I going in the wrong direction or is that PKB too strict ?

Alternate buffer pool is useful for the database objects (tables or indexes) with top /read/ activity. The objects with high CUD activity used to have rather low read activity.

If the read active objects are too large for -B2 then it's still worth to use the alternate buffer pool. For example, you can link all indexes to the -B2. In this case it would be enough to set the -B2 as 10-20% of the -B.

Posted by gus on 03-Jun-2015 16:06

you could put the indexes for high activity tables into the alternate buffer pool if the tables are too big to fit.

Posted by cverbiest on 04-Jun-2015 05:01

Using _index-stat we identified the most frequently used indexes. We will try to put those into the alternate buffer pool.

See https://gist.github.com/cverbiest/a30bfa8cb73d8d9e3081 for the code I used create a csv from _index-stat

Posted by George Potemkin on 04-Jun-2015 06:10

If the alternate buffer pool will be used for its main purpose (not to use LRU mechanism) then we can ask the question to control our choice of the objects for -B2: what the size of the alternate buffer pool (-B2) we are going to use? MBs? GBs? 10% of the current -B? 50%?

We can choose the large but frequently used tables provided that an application reads only the limited set of the records. Do we know the size of this set?

We can choose the frequently used indexes but will they /entirely/ fit into the -B2?

If the frequently used objects are too large for the -B2 (limited by memory that we can allocate for these objects) then there is a second possibility: to use the alternate buffer pool just to have two LRU latches instead one. "FOR EACH" query (unlike "FIND FIRST" query) creates in average two db requests per each record - one for index block and one for record block. If you will put all indexes (at least for the frequently used tables) in the alternate buffer pool then most of the time the "FOR EACH" queries will use two LRU latches at the /same/ time. Of course, it's better than to use only one latch.

Just in case if someone needs a program to collect _TableStat/_IndexStat:

ftp.progress-tech.ru/.../DbStatDump.p

ftp.progress-tech.ru/.../DbStatDiff.p

DbStatDump.p appends the dumps of _TableStat/_IndexStat (plus a bit more) from /all/ connected database to the same file. Plus it does the same with the sequences. You can run DbStatDump.p by cron or manually when needed.

DbStatDiff.p takes the files created DbStatDump.p and converts them to the reports per intervals between the dumps.

The files created by both programs can be opened in Excel.

Posted by George Potemkin on 04-Jun-2015 06:45

> See gist.github.com/.../a30bfa8cb73d8d9e3081 for the code I used create a csv from _index-stat

Totally off-topic: the program uses the _IndexStat-split and _IndexStat-blockdelete fields. Due to the Progress bug the value of "blockdelete" field is always zero and the value in "split" field is incorrect. The same is true for _UserIndexStat and for promon. They all read the correct bytes in shared memory. It's the Progress clients who update a wrong counter when a block is deleted (freed) from index tree. Namely they increase the "split" counter. In other words the "split" field returns the sum of the splits and deletes.

My DbStatDump.p is trying to adjust the results as:

_IndexStat-blockdelete = _IndexStat-split * _IndexStat-delete / (_IndexStat-delete + _IndexStat-create).

_IndexStat-split =  _IndexStat-split - _IndexStat-blockdelete.

It's not 100% accurate but it's better than the initial "lie".

This thread is closed