Scatter Factor in 11.6

Posted by Dmitri Levin on 17-Nov-2015 13:48

from 11.6 README.txt

PSC00318660
=============================
The Scatter Factor is incorrect in table analysis after loading into a multi-tenant table or a partitioned table.  

So my question is how scatter factor was changed/calculated in 11.6 for partitioned table?

All Replies

Posted by Richard Banville on 17-Nov-2015 14:03

The problem was that the scatter factor was including data for all partition of the table in the current area and not being performed according to partition so the numbers were typically much larger than they should have been.  The fix was to only include data particular to each partition in the calculation.

For Type I areas, the physical scatter factor was meant to indicate that a D&L would better organize the data scattered across blocks.   For data in Type II areas, the scatter factor is calculated very differently than data in a Type I areas since the data is segregated based on table/partition.  For Type II then, the scatter is basically telling you that you could consolidate free space allocated to the object (table/partition) if you were to D&L.  The usefulness of this information for Type II is arguable and I'm sure others will raise their subjective and objective opinions but that is what the value indicates.

Posted by George Potemkin on 18-Nov-2015 01:28

P176284: How is scatter factor calculated by the proutil dbanalys?
knowledgebase.progress.com/.../P176284
<quote>
Scatter for Type II is reported as the number of blocks occupied divided by the minimum number of blocks that table could occupy.
</quote>

How dbanalys calculate the minimum number of blocks that table could occupy?
Something like that?:
Min Block Count = Record Count / (Db Block Size - 64) * (Mean Rec Size + 2)

The goal is to use a scatter factor to estimate the number of blocks occupied by table because unfortunately dbanalys does not report it.

Regards,

George

Posted by George Potemkin on 20-Nov-2015 03:18

I run some tests with the scatter factor:
Progress V10.2B
d "Table Area":7,8;8 .

I created 15 records. They used 2 blocks: first one stores 7 records + template record, second block stores 8 records.
Scatter factor was 1.0.

Then I deleted a half of the records (if tbl.i mod 2 eq 0 then delete tbl).
Scatter factor changed to 1.3. Note that EXP(10, 0.3) = 2. In other words, the scatter factor is still 1 + LOG(Ratio, 10).
The record size does not matter. I tried the mean rec size 19 bytes as well as 471 bytes. Also I removed the recid holders, so two blocks stored only 4 + 1 records and 4 records.

Then I deleted all records except one in each block (if tbl.i ne 1 and tbl.i ne 9 then delete tbl).
Scatter factor was not changed, it's still equal 1.3.
The record size does not matter again.

It looks like the scatter factor does not work as a block packing factor.

Regards,
George

Posted by Richard Banville on 20-Nov-2015 07:12

You should file a bug.  The algorithm has changed over the years and is probably different in later versions of OE 11. 
Also, the scatter really only works when you have a fair amount of data.  With few records and few blocks it doesn’t change much.
 
 

Posted by George Potemkin on 20-Nov-2015 08:02

Hi Richard,

I'll repeat the tests with 11.6.

> Also, the scatter really only works when you have a fair amount of data. With few records and few blocks it doesn’t change much.

The tests were aimed just to find out the formula behind the scatter factor. I'm the one who don't believe in the usefulness of scatter factor for Type II areas. The tests that Dmitri and me run on the real databases for the presentation in Copenhagen as well as the tests with idxfix seemed to demonstrate that the main factors are different.

Best regards,
George

Posted by gus on 20-Nov-2015 08:11

imo, it is a largely useless metric.

more useful to know would be "if i use index x, how well does its ordering match the physical storage ordering and how many i/o's will that require?"

regards,

gus (gus@progress.com)

"Debugging is twice as hard as writing the code in the first place.

Therefore, if you write the code as cleverly as possible, you are,

by definition, not smart enough to debug it." -- Brian Kernighan

> On Nov 20, 2015, at 4:18 AM, George Potemkin wrote:

>

> Update from Progress Community [https://community.progress.com/]

>

> George Potemkin [https://community.progress.com/members/georgep12]

>

> I run some tests with the scatter factor:

> Progress V10.2B

> d "Table Area":7,8;8 .

>

> I created 15 records. They used 2 blocks: first one stores 7 records + template record, second block stores 8 records.

> Scatter factor was 1.0.

>

> Then I deleted a half of the records (if tbl.i mod 2 eq 0 then delete tbl).

> Scatter factor changed to 1.3. Note that EXP(10, 0.3) = 2. In other words, the scatter factor is still 1 + LOG(Ration, 10).

> The record size does not matter. I tried the mean rec size 19 bytes as well as 471 bytes. Also I removed the recid holders, so two blocks stored only 4 + 1 records and 4 records.

>

> Then I deleted all records except one in each block (if tbl.i ne 1 and tbl.i ne 9 then delete tbl).

> Scatter factor was not changed, it's still equal 1.3.

> The record size does not matter again.

>

> It looks like the scatter factor does not work as a block packing factor.

>

> Regards,

> George

>

> View online [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75541#75541]

>

> You received this notification because you subscribed to the forum. To unsubscribe from only this thread, go here [https://community.progress.com/community_groups/openedge_rdbms/f/18/t/21437/mute].

>

> Flag [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75541?AbuseContentId=42aa262e-7d3e-40fb-be8e-0181eab2d7ec&AbuseContentTypeId=f586769b-0822-468a-b7f3-a94d480ed9b0&AbuseFlag=true] this post as spam/abuse.

Posted by Richard Banville on 20-Nov-2015 08:21

Yes, that would be a good metric to add – termed logical scatter.  An enhancement request would need to be made for this.
 
 
 

Posted by Richard Banville on 20-Nov-2015 08:24

I can dig up the algorithm.  It is largely based on record size and blocks allocated and then some math to put it into a factor.  This is why small amounts of data really don’t make much of a difference for this algorithm.
 
 
 

Posted by George Potemkin on 20-Nov-2015 08:28

> imo, it is a largely useless metric.

It can't explain/predict, for example, how long dbanalys will scan RM chains. Of course, an application does not scan these chains. But scatter factor can't also explain the time to scan the free cluster chain - the clusters that were owned by the real but now deleted objects. Dbanalys scans them a few times slower than the sequencial scan dirung "Area Block Analysis" phase.

> more useful to know would be "if i use index x, how well does its ordering match the physical storage ordering and how many i/o's will that require?"

It's exactly what my SeekRatio.p program does:
ftp://ftp.progress-tech.ru/pub/Users/george/Programs/SeekRatio.p
You can see the huge difference between the indexes even in the sports database.

Seek Path = sum of abs(current dbkey - previous dbkey)
where dbkey is a block address of the records (rather than recid). It's like a distance covered by a player during a game,
Seek Ratio = Seek Path / (max dbkey - min dbkey)

Best regards,
George

Posted by James Palmer on 20-Nov-2015 08:35
Posted by gus on 20-Nov-2015 09:14

geroge,

i have a similar program, but yours is better.

regards,

Gus Bjorklund, BravePoint MDBA Services

gus@progress.com

> On Nov 20, 2015, at 9:29 AM, George Potemkin wrote:

>

> Update from Progress Community [https://community.progress.com/]

>

> George Potemkin [https://community.progress.com/members/georgep12]

>

>> imo, it is a largely useless metric.

>

> It can't explain/predict, for example, how long dbanalys will scan RM chains. Of course, an application does not scan these chains. But scatter factor can't also explain the time to scan the free cluster chain - the clusters that were owned by the real but now deleted objects. Dbanalys scans them a few times slowes than the sequencial scan dirung "Area Block Analys" phase.

>

>> more useful to know would be "if i use index x, how well does its ordering match the physical storage ordering and how many i/o's will that require?"

>

> It's exactly what my SeekRatio.p program does:

> ftp.progress-tech.ru/.../SeekRatio.p

> You can see the huge difference between the indexes even in the sports database.

>

> Best regards,

> George

>

> View online [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75564#75564]

>

> You received this notification because you subscribed to the forum. To unsubscribe from only this thread, go here [https://community.progress.com/community_groups/openedge_rdbms/f/18/t/21437/mute].

>

> Flag [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75564?AbuseContentId=6e578aaa-95ae-4d6e-a816-425a8bc0993a&AbuseContentTypeId=f586769b-0822-468a-b7f3-a94d480ed9b0&AbuseFlag=true] this post as spam/abuse.

Posted by George Potemkin on 20-Nov-2015 11:53

> i have a similar program

How useful would the queries with the INDEXED FIELDS option! Where the INDEXED keyword should mean: if the FIELDS option specify only the fields that are the components of the indexes used by the query then don't read the records but use the index keys as the field's values. Of course, the index keys do not allow to restore the original values of the character fields (at least the case of the characters will be lost) but what they allows to restore will be valid for the comparison operations. No matter how bad is a logical scatters of the indexes such queries would be (almost) as fast as possible. And such queries would be useful to check the quality of the indexes themselves.

Best regards,

George

Posted by George Potemkin on 22-Nov-2015 03:40

> Also, the scatter really only works when you have a fair amount of data. With few records and few blocks it doesn’t change much.

Now I understand. Indeed it was my mistake: I compared the scatter factors for a table with two data blocks. In first case each block stored 4 records, in second case - only one record. But in the best case a block could store 8 records. So in both cases the ratio of the real block count to the optimal one was 2:1 and that is why the scatter factor was 1.3 in both cases.

I run new tests with 10.2B as well as with 11.6.
The scatter factors reported by dbanalys were equal the values calculated by the following formula:

ASSIGN
  vMaxRPB = (vDbBlockSize - 64 - vCreateLimit) / (vMeanRecSize + 2)
  vMaxRPB = MIN(vMaxRPB, vAreaRPB)
  vRealRPB = vRecCount / vBlockCount
  vScatterFactor = 1 + LOG(vMaxRPB / vRealRPB, 10)
. /* ASSIGN */

I'm not sure that dbanalys uses exactly same constants to estimate vMaxRPB as in the formula above. But scatter factor is reported with low precision so the real values of these constants don't seem to matter.

Conclusion #1: the scatter factor can be used to estimate the number of blocks that store the table's records resided in type two storage area. Note: an object block stores the number of clusters owned by object. But scatter factor is based on the block count rather than on cluster count.

Conclusion #2: the large scatter factors for the tables in SAT2 seem to mean that the tables have long RM chains. Long RM chain is a "bad thing" and it leads to the performance degradation for the reading. Hence the scatter factors larger than 1.0 need your attention even in SAT2.

Best regards,
George

Posted by George Potemkin on 07-Jan-2016 09:11

I had researched a bit father. Progress V11.6.

It looks like dbanalys uses the following formula to calculate a scatter factor:

ASSIGN
  MaxRPB = (DbBlockSize - 52) / MeanRecSize
  MaxRPB = MIN(MaxRPB, AreaRPB)
  RealRPB = RecCount / UsedBlocks
  ScatterFactor = 1 + LOG(MaxRPB / RealRPB, 10)
. /* ASSIGN */

The tests confirmed that a scatter factor does not take into account a create limit (expansion space constant).

I guess dbanalys incorrectly thinks the size of the data block headers is 52 bytes. But a size of standard block header (bkHeaderSize) is 64 bytes or even 80 bytes for the borderline blocks in data cluster. The results per area reported by message 3922 are also affected.
For example:
508 RM block(s) found in the storage area.
1.27% of the RM block space is used.

RM CHAIN ANALYSIS
---------------------------

          LIST OF RM CHAIN BLOCKS
                free    # free
        dbkey   space   slots   hold
        132096          4008    254     0
        132352          4028    255     0
...
        261632          4028    255     0
        261888          4012    255     0

The used space = 4096 - 4028 = 68 bytes (for the most blocks) = 64 bytes (bkHeaderSize) + 4 bytes (No of Recs, Free Slots, No Bytes Free Space). Record offset directory is empty in these blocks.

1.27% of 4096 bytes = 52 bytes

So dbanalys also does not count the size of a RM block header (4 bytes + record offset directory).
But it's a minor issue.
A bigger problem is a definition of the blocks used by a table. It can cause the funny results like:

                          -Record Size (B)- ---Fragments--- Scatter
Table   Records    Size   Min   Max  Mean      Count Factor  Factor
PUB.tbl       1   19.0B    19    19    19          1    1.0     3.7

Dbanalys says that a /single/ record is "scattered" in the are with a huge factor (3.7): it uses 501 times = EXP(10, 2.7) more space than it could. Do we need a dump and load for this record? ;-)

A simple test:
Area: d "Table Area":7,256;512 .
Db block size: 4096

At first I created only one record:

                          -Record Size (B)- ---Fragments--- Scatter
Table   Records    Size   Min   Max  Mean      Count Factor  Factor
PUB.tbl       1   19.0B    19    19    19          1    1.0     1.0

So far so good.

Then I created 79111 records to fill all 508 blocks in the table's data cluster (the first 4 blocks in the first data cluster are not RM blocks).
Then I deleted all records and again created only one record. And now the scatter factor is 3.7. Table owned all 508 data blocks from beginning. Now dbanalys threats 507 blocks as "used" because each block stores one recid lock.

How many blocks are "used" by the table in my case? One or 508 blocks? Both answers are correct:
The queries that used the indexes will retrieve only one data block from disk - the one that stores the only record in the table.
The queries with TABLE-SCAN option will retrieve all 508 blocks.

The quiz at the end ;-)
What scatter factor table will have if I'll delete its last record?
Scroll right to see the answer:

->                                                                                                                                                                   0.0 for the tabel and 1.0 as average:
->
->                                                                                                                                                                                    -Record Size (B)- ---Fragments--- Scatter
->                                                                                                                                                          Table   Records    Size   Min   Max  Mean      Count Factor  Factor
->                                                                                                                                                          PUB.tbl       0    0.0B     0     0     0          0    0.0     0.0
->                                                                             
->                                                                                                                                                                  -----------------------------------------------------------
->                                                                                                                                                          Subtotals:    0    0.0B     0     0     0          0    0.0     1.0

Best regards,
George

This thread is closed