High create / Delete tables in own extent (Type 1)

Posted by Grant Holman on 03-May-2016 03:36

75%+ of all CREATE activity occurs in just 3 tables on our 9.1 DB (i.e. Type 1 extents) .  These tables constitute about 38% of the total DB size.

Would it be worth having these in their own extent to reduce Scatter?  My thinking here is that all this delete activity is peppering the DB with gaps like Swiss cheese leading to high scatter over time.

All Replies

Posted by James Palmer on 03-May-2016 03:54

In my opinion you'd be better off spending the effort upgrading, and moving to Type II storage areas really. Some people do say that busy tables (and their indexes) should be in separate areas, even in Type II, but it's the sort of question that tends to divide the crowd.

Posted by Grant Holman on 03-May-2016 04:50

Correction, first sentence should have started "75%+ of all DELETE activity...".

Posted by Grant Holman on 03-May-2016 04:52

Hi James, totally agree on the need to upgrade but for the immediate future this isn't an option the company I am contracting at are prepared to undertake at present.  Maybe next year...

Posted by ChUIMonster on 03-May-2016 07:28

Yes.  Putting those tables into *dedicated* storage areas with appropriately sized rows per block will improve overall performance.

One storage area per table for the data.  Set RPB to dbBlockSize / (avgRowSize + 20 ) (get that from dbanalys) and round up to the next power of 2.  Put the indexes into dedicated storage areas corresponding to the tables.  It is ok to have multiple indexes per area.

It is not quite as good as upgrading to OE11 and converting to Type 2 areas -- but it is close.

I don't think splitting data and indexes into their own areas is at all controversial.  There is occasionally some "spirited discussion" about the appropriate rows per block for index areas.  Some argue for "1" since there is no data.  Others say that if you mess up and accidentally put data in an index area then it is better to have a reasonable default value such as 128.  

Posted by gus on 03-May-2016 09:20

setting the rows per block of an index are to 1 gives such a small advantage it is nearly immeasureable.

the consequences of making a mistake and accidentally putting a table into such an area by accident is very bad. Usually it involves much unscheduled downtime. I have paid dearly for this error and will never make it again.

Posted by George Potemkin on 03-May-2016 09:31

Small simple 4GL program can warn us if a table resided in area with RPB 1. The sooner it happens the easier to fix the problem. RPB 1 means: "For indexes only. Even LOB objects are not allowed here!". ;-)

Posted by gus on 03-May-2016 09:36

so can looking at the current .st file, which i was doing, but not carefully enough.

Posted by gus on 03-May-2016 09:40

> On May 3, 2016, at 10:32 AM, George Potemkin wrote:

>

> The sooner it happens the easier to fix the problem

true. in my case, i had a fixed downtime window i could not exceed so i could not complete the other tasks i had planned. i noticed during a binary load that the area was growing too fast. still had to pay the piper.

Posted by George Potemkin on 03-May-2016 09:48

And dbanalys could report an actual value of records per block individually for each table. So we could easy compare it with an area's RPB. Does anybody notice how wrong is the choice of the area's RPBs in sports database? ;-)

Dbanalys "invisibly" reports the number of blocks used by each table. We can "extract" this information and calculate the actual RPBs ourselves.

Posted by gus on 03-May-2016 09:56

> On May 3, 2016, at 10:49 AM, George Potemkin wrote:

>

> And dbanalys could report an actual value of records per block individually for each table. So we could easy compare it with an area's RPB. Does anybody notice how wrong is the choice of the area's RPBs in sports database? ;-)

this is a waste of time.

Posted by George Potemkin on 03-May-2016 10:07

Setting RPBs for sports database? Well, the sports database is maily used in the readprobe tests and the rather small values of RPBs improve the results of such tests. Sometimes little things mean a lot. ;-)

Posted by Grant Holman on 04-May-2016 02:01

Thanks for all your replies.

Hello Tom - I've run proutil tabanalys to get the avg row size and then used this to split the tables into 256,128,64,32,8,4,2,1 RPB.  I'm interested in how you said to add 20 to the avg size - is this to allow for potential row expansion?  Is it '20' for a reason - e.g. a good rule of thumb or something more scientific?  There is a KB which also says to subtract 100 from the DB block size to accommodate the block header information - is this something you would do?

Posted by George Potemkin on 04-May-2016 02:48

> 75%+ of all CREATE activity occurs in just 3 tables on our 9.1 DB (i.e. Type 1 extents)

Do you have the statistics for these tables?:

_TableStat-update vs _TableStat-create vs _TableStat-delete

> Would it be worth having these in their own extent to reduce Scatter?  My thinking here is that all this delete activity is peppering the DB with gaps like Swiss cheese leading to high scatter over time.

How long are RM chains of these tables? It's more important than their scatter factors.

> There is a KB which also says to subtract 100 from the DB block size to accommodate the block header information

100 bytes is close to the size of RM block headers in storage area type 2. If one cares about such small pieces of block space then the toss limit (300 bytes for 8K) should be taken into account as well. But I wonder why we need to calculate the best number of records per block with a lot of the decimals and then round the value up to a power of two (not even to a nearest integer)?

Posted by Grant Holman on 04-May-2016 07:32

Hello George,

> Do you have the statistics for these tables?:

Yes I do.  These tables cumulatively account for ~75% of the total DB Create, Delete and Update activity,  but they have relatively low Read activity compared to other tables.

> How long are RM chains of these tables?

proutil dbanlys reports:

2,067,264 total blocks found in the database.

1,123,270 RM block(s) found in the database.

42.47% of the RM block space is used.

How do I interpret this?

Posted by George Potemkin on 04-May-2016 07:52

Hello Grant,

I meant the ratio between creates/deletes/updates for those three tables individually.

> 2,067,264 total blocks found in the database.

> 1,123,270 RM block(s) found in the database.

I meant the message 3905:

<n> block(s) found in the RM chain.

Posted by ChUIMonster on 04-May-2016 07:56

20 is a "rule of thumb" that basically boils down the various block and record overhead into single number.  It is not terribly scientific but, in my experience, it works well.  As George points out -- there isn't much value in calculating the value out to N decimal places when you're just going to round up to a power of 2.  There are kbases, white papers and spreadsheets floating around that will tell you how to do that if you feel really compelled.  But IMHO there is simply no value in doing so.

Posted by gus on 04-May-2016 08:10

> On May 4, 2016, at 3:02 AM, Grant Holman wrote:

>

> I've run proutil tabanalys to get the avg row size and then used this to split the tables into 256,128,64,32,8,4,2,1 RPB.

this is not necessary.

versions 10.1b ? and later have 64-bit rowid's (which you should enable if they are not so).

this means theat you do not have to ever worry about running out of rowid's. The roughly 2.1 billion rowid (and record fragment) limit is gone. you do not have to be sure the maximum-rows-per-block is set low to avoid wasting rowids. forget about that.

you only have to be sure that the maximum rows per block is set high enough that you don't waste too much space. for example:

with maximum-rows-per-block set to 64, an 8k block with 64 records has room for about 6600 bytes of data. if your records average 50 bytes long (not common), then 64 records adds up to only 3200 bytes. so about half the block would be wasted (though it could be partly used if the rows in it expand during updates).

Posted by Grant Holman on 04-May-2016 10:17

Hope this ok George:

Table  Create  Update  Delete

A      326K    12.2M!  465K

B      335K    417K    333K

C      176K    196K    177K

 

I can't see 'RM chain' in the dbanalys output.  Should I be running something else?

Posted by Grant Holman on 04-May-2016 10:20

Thanks Tom.

Posted by Grant Holman on 04-May-2016 10:33

Hi Gus.  Are you saying creating extents to based on RPBs is a pointless exercise?

We are running on Windows with a 4K O/S block size (I think Windows actually calls this Cluster size?) and a 4K DB size (it's my understanding this is the most appropriate for Windows, please correct me if 8K would be better!).  BI is 8K though.

The point of the exercise is not to address 32 bit rowid limits - our DBs only have 10s of millions of records in total so 2.1 billion per extent is more than enough!  However we have some tables with very small records (e.g. 30 bytes)  and some larger (e.g. 3,500 bytes) - it seemed to me that putting these in appropriately RPB sized extents would make sense.  Am I wasting my time?  Should I just bunge everything into 32 RPB extents and not worry about it?

Posted by George Potemkin on 04-May-2016 10:35

> Table  Create  Update  Delete

> A      326K    12.2M!  465K

More deletes than updates?! The number of records in the table is decreasing. Not to zero? ;-)

> B      335K    417K    333K

> C      176K    196K    177K

It might be that the most updates happen right after the creates.

For example:

do transaction:

 create table.

 assign table.key_fields.

 assign table.other_fields.

end.

If this guess is correct then you need carefully to choose RPB for these two tables (or to set the create limit if this option is available in your Progress version).

> I can't see 'RM chain' in the dbanalys output.  Should I be running something else?

Did you run dbanalys or tabanalys?

Posted by ChUIMonster on 04-May-2016 12:25

Creating Storage Areas (extents are the individual files that make up a storage area) with finely tuned rows per block has very few benefits these days.

Personally I still do it.  1) it's a habit  2) it's not that hard so long as you don't obsess over the decimal point and 3) I have certain tools that I sometimes run which use Monte Carlo sampling of RECIDs.  They run a lot faster if there aren't a lot of missing recids ;)

If you are going to take the "one size fits all" approach then you should probably choose 128 or 256.  Some people argue that 256 is pointless because nobody has records that small.  I happen to know of some tables whose records are indeed that small.  So I go with 256 when I am doing that.

In my experience 8K blocks will give somewhat better read performance.  Even on Windows.  But it is not usually enough to argue about.

Posted by gus on 04-May-2016 15:56

> On May 4, 2016, at 1:26 PM, ChUIMonster wrote:

>

> If you are going to take the "one size fits all" approach then you should probably choose 128 or 256. Some people argue that 256 is pointless because nobody has records that small. I happen to know of some tables whose records are indeed that small. So I go with 256 when I am doing that.

They have to be /very/ small.

Here's the scoop on 256 rows per block:

Block size 8192 bytes

Data Block header 64 bytes

Record block extra header 12 bytes

Row directory for 256 records 512 bytes

Per row overhead (17 x 256) 4352 bytes

Create limit 300 bytes

Total overhead 5240 bytes

Space left for records 2952 bytes

2952 / 256 11.5 bytes for each record if they are all the same size.

HOWEVER

*******

with 256 rows per block, if you put just one record in the block (say 7,000 bytes) it would look like this:

Block size 8192 bytes

Data Block header 64 bytes

Record block extra header 12 bytes

Row directory for 1 record 2 bytes

Per row overhead (17 x 256) 17 bytes

Create limit 300 bytes

Total overhead 395 bytes

Space left for records 7797 bytes

1 7,000 byte record 7000 bytes

Available space 797 bytes

My point here is this: you do NOT have to sort all your tables into buckets of 1,2,4,8,16,32,64,128, and 256 buckets.

You simply have to make sure that the maximum rows per block is large enough. 7,000 byte records can be stored in areas that are NOT 1 row per bloc or 2 rows per block just fine. You lose nothing by putting 7,000 byte records into a block that can handle 64 or 128 rows.

In Type 2 areas (which you should use exclsuively) rows from diferent tables are not mixed.

Stop doing all the extra pointless analysis you have been browbeaten into doing. The times when it is worthwhile are rare.

Posted by Grant Holman on 05-May-2016 01:55

George - I ran dbanalys.  This is on 9.1D07

Posted by Grant Holman on 05-May-2016 02:01

George - interested to see that multiple assign statements would increase the update count in _tablestat even if within a single transaction, I didn't realise this.

Posted by Grant Holman on 05-May-2016 02:15

Gus - thanks for this detailed reply, I'll need to study it first to make sure I get the finer detail!  N.B. I am using 9.1D so only have Type 1 extents at my disposal.

Posted by George Potemkin on 05-May-2016 02:36

> I ran dbanalys

The output's header should report the line:

Options: chanalys ixanalys tabanalys

The "chanalys" option means that the output should have the "* CHAIN ANALYSIS" segments (msg ## 3873 & 3892). These messages exist at least since Progress V6.

> interested to see that multiple assign statements would increase the update count in _tablestat even if within a single transaction

Most likely a first assign statement creates a record with the size just a bit larger than the template record that in its turn can be close to the minimal record size reported by tabanalys. Second assign statement changes the record's size to its final value, let's say to an average record size reported by tabanalys. So you can estimate the size of the record's expansions as AverageRecSize - MinRecSzie. Only a record that uses a last slot in a block has a chance to become fragmeneted. If it's your case than (Fragments - Records) / Records ration should be close to 1 / RealRPB where RealRPB ~ BlockSize / AverageRecSize. This approach will work only for those tables where _Table-update / _Table-create ration is aproximately 1:1.

Posted by Grant Holman on 05-May-2016 04:17

Hi Tom - taking eveyone's input into account I think I'll test both 4K block / 64 RPB and 8K block / 128 RPB DB versions.  I'll put the Indexes in their own extent with 32 RPB (to cover me in case a table gets put in their by mistake).  Thanks for all your input.

Posted by gus on 05-May-2016 06:10

More:

Suppose you have maximum-rows-per-block at 256 and 8k data blocks.

Now you put a single 100 byte record into a block. It will look like this:

Block size 8192 bytes

Data Block header 64 bytes

Record block extra header 12 bytes

Row directory for 1 record 2 bytes

Per row overhead (17 x 1) 17 bytes

Create limit 300 bytes

Total overhead 395 bytes

Space left for records 7797 bytes

1 100 byte record 100 bytes

Free space 7697 bytes.

Note that the row directory is only /one/ slot at this point.

If we put another 100 byte record into the same block, we expand the row directory by one more entry (2 bytes) and we add another 17 bytes of overhead for the second record. So the second record uses 119 bytes of space (100 + 2 + 17). We end up with this:

Block size 8192 bytes

Data Block header 64 bytes

Record block extra header 12 bytes

Row directory for 1 record 4 bytes

Per row overhead (17 x 2) 34 bytes

Create limit 300 bytes

Total overhead 414 bytes

Space left for records 7778 bytes

2 100 byte records 200 bytes

Free space 7578 bytes.

Posted by Grant Holman on 05-May-2016 07:15

Gus - are your figures (17 byte overhead per row, 64 byte block header, 2 bytes per row etc.) specifically for Type 2 or do they apply to Type 1 as well?

Posted by Grant Holman on 05-May-2016 07:18

I think Create Limit is 75 bytes for V9

Posted by gus on 05-May-2016 08:32

> On May 5, 2016, at 8:16 AM, Grant Holman wrote:

>

> Gus - are your figures (17 byte overhead per row, 64 byte block header, 2 bytes per row etc.) specifically for Type 2 or do they apply to Type 1 as well?

For Type ii.

The Type i block header is smaller, 32 bytes. The rest is the same. The 17 bytes is actually in each record, an invisible header at the front of it.

Posted by gus on 05-May-2016 08:34

> On May 5, 2016, at 8:19 AM, Grant Holman wrote:

>

> I think Create Limit is 75 bytes for V9

varies by block size and by whatever you change it to be.

Posted by Grant Holman on 05-May-2016 08:44

George - ran chanalys and got:

3887 block(s) found in the RM chain.

129580 block(s) found in the RM chain.

761465 block(s) found in the RM chain.

194311 block(s) found in the RM chain.

4088 block(s) found in the RM chain.

Posted by Dmitri Levin on 05-May-2016 13:24

With "one size fits all" RPB approach I believe we look at database from only one side, i.e. how to effectively fit records into blocks. And 128 (or 256) RPB is the most likely answer.

However if we look at it from the performance side we will see that the choice of a slightly lower RBP will help to avoid record fragmentation. I believe George wrote that "fragmentation happen to the last record that we put into a block".

We can avoid/limit fragmentation by putting table into area with lower RPB or probably better approach is to set a bigger Create limit on a table itself. And changing Create limit is much more easy than to move table in a different area. So I am not actually advising to put table into areas with appropriate RPBs to avoid fragmentation, but only pointing out that it could help to avoid fragmentation.

Posted by gus on 05-May-2016 13:38

Dmitri,

I agree. I was not suggesting to always use 256. I was trying to explain how things work and using 256 as an example. One reason I use it as an example was to show that it should not be used at all for 4k blocks because the overhead from 256 records eats too much space.

In most cases, I think 64 or 128 are reasonable choices.

1, 2, 4, 8 are /almost never/ reasonable choices.

Doing too much analysis and sorting you tables into amny row sized buckets is pointless waste of time.

Much more important is to get the create and toss limits right.

wrong create limit leads either to fragmentaton or wasted space.

wrong toss limit leads to long rm chains and slow space allocations.

regards,

gus (gus@progress.com)

"A distributed system is one in which the failure of a computer you didn't

even know existed can render your own computer unusable."

-- Leslie Lamport

> On May 5, 2016, at 2:25 PM, Dmitri Levin wrote:

>

> Update from Progress Community [https://community.progress.com/]

>

> Dmitri Levin [https://community.progress.com/members/broder]

>

> With "one size fits all" RPB approach I believe we look at database from only one side, i.e. how to effectively fit records into blocks. And 128 (or 256) RPB is the most likely answer.

>

> However if we look at it from the performance side we will see that the choice of a slightly lower RBP will help to avoid record fragmentation. I believe George wrote that "fragmentation happen to the last record that we put into a block".

>

> We can avoid/limit fragmentation by putting table into area with lower RPB or probably better approach is to set a bigger Create limit on a table itself. And changing Create limit is much more easy than to move table in a different area. So I am not actually advising to put table into areas with appropriate RPBs to avoid fragmentation, but only pointing out that it could help to avoid fragmentation.

>

> View online [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/24745/86061#86061]

>

> You received this notification because you subscribed to the forum. To unsubscribe from only this thread, go here [https://community.progress.com/community_groups/openedge_rdbms/f/18/t/24745/mute].

>

> Flag [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/24745/86061?AbuseContentId=f9f6d6e2-1f89-4137-aa12-c83e52fa7a80&AbuseContentTypeId=f586769b-0822-468a-b7f3-a94d480ed9b0&AbuseFlag=true] this post as spam/abuse.

Posted by George Potemkin on 06-May-2016 05:19

> George - ran chanalys and got:

In 5 areas the total number of blocks found in the RM chains is 1,093,331.

> 1,123,270 RM block(s) found in the database.

> 42.47% of the RM block space is used.

Or 97% of all data blocks (RM blocks) are in the RM chains. If I understand correctly you don't have a real performance issue but are looking for the "best practice" recommendations. These recommendations are focused on /storing/ data in database. In your case the application creates a lot of data and then deletes the most of the records. It's rather unusual that this seems to be true almost for the whole database.

I would begin from a different question: "how to check if my application has the performance issues?"

This thread is closed