from 11.6 README.txt
PSC00318660 ============================= The Scatter Factor is incorrect in table analysis after loading into a multi-tenant table or a partitioned table.
So my question is how scatter factor was changed/calculated in 11.6 for partitioned table?
The problem was that the scatter factor was including data for all partition of the table in the current area and not being performed according to partition so the numbers were typically much larger than they should have been. The fix was to only include data particular to each partition in the calculation.
For Type I areas, the physical scatter factor was meant to indicate that a D&L would better organize the data scattered across blocks. For data in Type II areas, the scatter factor is calculated very differently than data in a Type I areas since the data is segregated based on table/partition. For Type II then, the scatter is basically telling you that you could consolidate free space allocated to the object (table/partition) if you were to D&L. The usefulness of this information for Type II is arguable and I'm sure others will raise their subjective and objective opinions but that is what the value indicates.
P176284: How is scatter factor calculated by the proutil dbanalys?
knowledgebase.progress.com/.../P176284
<quote>
Scatter for Type II is reported as the number of blocks occupied divided by the minimum number of blocks that table could occupy.
</quote>
How dbanalys calculate the minimum number of blocks that table could occupy?
Something like that?:
Min Block Count = Record Count / (Db Block Size - 64) * (Mean Rec Size + 2)
The goal is to use a scatter factor to estimate the number of blocks occupied by table because unfortunately dbanalys does not report it.
Regards,
George
I run some tests with the scatter factor:
Progress V10.2B
d "Table Area":7,8;8 .
I created 15 records. They used 2 blocks: first one stores 7 records + template record, second block stores 8 records.
Scatter factor was 1.0.
Then I deleted a half of the records (if tbl.i mod 2 eq 0 then delete tbl).
Scatter factor changed to 1.3. Note that EXP(10, 0.3) = 2. In other words, the scatter factor is still 1 + LOG(Ratio, 10).
The record size does not matter. I tried the mean rec size 19 bytes as well as 471 bytes. Also I removed the recid holders, so two blocks stored only 4 + 1 records and 4 records.
Then I deleted all records except one in each block (if tbl.i ne 1 and tbl.i ne 9 then delete tbl).
Scatter factor was not changed, it's still equal 1.3.
The record size does not matter again.
It looks like the scatter factor does not work as a block packing factor.
Regards,
George
Hi Richard,
I'll repeat the tests with 11.6.
> Also, the scatter really only works when you have a fair amount of data. With few records and few blocks it doesn’t change much.
The tests were aimed just to find out the formula behind the scatter factor. I'm the one who don't believe in the usefulness of scatter factor for Type II areas. The tests that Dmitri and me run on the real databases for the presentation in Copenhagen as well as the tests with idxfix seemed to demonstrate that the main factors are different.
Best regards,
George
imo, it is a largely useless metric.
more useful to know would be "if i use index x, how well does its ordering match the physical storage ordering and how many i/o's will that require?"
regards,
gus (gus@progress.com)
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
> On Nov 20, 2015, at 4:18 AM, George Potemkin wrote:
>
> Update from Progress Community [https://community.progress.com/]
>
> George Potemkin [https://community.progress.com/members/georgep12]
>
> I run some tests with the scatter factor:
> Progress V10.2B
> d "Table Area":7,8;8 .
>
> I created 15 records. They used 2 blocks: first one stores 7 records + template record, second block stores 8 records.
> Scatter factor was 1.0.
>
> Then I deleted a half of the records (if tbl.i mod 2 eq 0 then delete tbl).
> Scatter factor changed to 1.3. Note that EXP(10, 0.3) = 2. In other words, the scatter factor is still 1 + LOG(Ration, 10).
> The record size does not matter. I tried the mean rec size 19 bytes as well as 471 bytes. Also I removed the recid holders, so two blocks stored only 4 + 1 records and 4 records.
>
> Then I deleted all records except one in each block (if tbl.i ne 1 and tbl.i ne 9 then delete tbl).
> Scatter factor was not changed, it's still equal 1.3.
> The record size does not matter again.
>
> It looks like the scatter factor does not work as a block packing factor.
>
> Regards,
> George
>
> View online [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75541#75541]
>
> You received this notification because you subscribed to the forum. To unsubscribe from only this thread, go here [https://community.progress.com/community_groups/openedge_rdbms/f/18/t/21437/mute].
>
> Flag [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75541?AbuseContentId=42aa262e-7d3e-40fb-be8e-0181eab2d7ec&AbuseContentTypeId=f586769b-0822-468a-b7f3-a94d480ed9b0&AbuseFlag=true] this post as spam/abuse.
> imo, it is a largely useless metric.
It can't explain/predict, for example, how long dbanalys will scan RM chains. Of course, an application does not scan these chains. But scatter factor can't also explain the time to scan the free cluster chain - the clusters that were owned by the real but now deleted objects. Dbanalys scans them a few times slower than the sequencial scan dirung "Area Block Analysis" phase.
> more useful to know would be "if i use index x, how well does its ordering match the physical storage ordering and how many i/o's will that require?"
It's exactly what my SeekRatio.p program does:
ftp://ftp.progress-tech.ru/pub/Users/george/Programs/SeekRatio.p
You can see the huge difference between the indexes even in the sports database.
Seek Path = sum of abs(current dbkey - previous dbkey)
where dbkey is a block address of the records (rather than recid). It's like a distance covered by a player during a game,
Seek Ratio = Seek Path / (max dbkey - min dbkey)
Best regards,
George
I've logged an idea for this. community.progress.com/.../logical_scatter_metric
geroge,
i have a similar program, but yours is better.
regards,
Gus Bjorklund, BravePoint MDBA Services
gus@progress.com
> On Nov 20, 2015, at 9:29 AM, George Potemkin wrote:
>
> Update from Progress Community [https://community.progress.com/]
>
> George Potemkin [https://community.progress.com/members/georgep12]
>
>> imo, it is a largely useless metric.
>
> It can't explain/predict, for example, how long dbanalys will scan RM chains. Of course, an application does not scan these chains. But scatter factor can't also explain the time to scan the free cluster chain - the clusters that were owned by the real but now deleted objects. Dbanalys scans them a few times slowes than the sequencial scan dirung "Area Block Analys" phase.
>
>> more useful to know would be "if i use index x, how well does its ordering match the physical storage ordering and how many i/o's will that require?"
>
> It's exactly what my SeekRatio.p program does:
> ftp.progress-tech.ru/.../SeekRatio.p
> You can see the huge difference between the indexes even in the sports database.
>
> Best regards,
> George
>
> View online [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75564#75564]
>
> You received this notification because you subscribed to the forum. To unsubscribe from only this thread, go here [https://community.progress.com/community_groups/openedge_rdbms/f/18/t/21437/mute].
>
> Flag [https://community.progress.com/community_groups/openedge_rdbms/f/18/p/21437/75564?AbuseContentId=6e578aaa-95ae-4d6e-a816-425a8bc0993a&AbuseContentTypeId=f586769b-0822-468a-b7f3-a94d480ed9b0&AbuseFlag=true] this post as spam/abuse.
> i have a similar program
How useful would the queries with the INDEXED FIELDS option! Where the INDEXED keyword should mean: if the FIELDS option specify only the fields that are the components of the indexes used by the query then don't read the records but use the index keys as the field's values. Of course, the index keys do not allow to restore the original values of the character fields (at least the case of the characters will be lost) but what they allows to restore will be valid for the comparison operations. No matter how bad is a logical scatters of the indexes such queries would be (almost) as fast as possible. And such queries would be useful to check the quality of the indexes themselves.
Best regards,
George
> Also, the scatter really only works when you have a fair amount of data. With few records and few blocks it doesn’t change much.
Now I understand. Indeed it was my mistake: I compared the scatter factors for a table with two data blocks. In first case each block stored 4 records, in second case - only one record. But in the best case a block could store 8 records. So in both cases the ratio of the real block count to the optimal one was 2:1 and that is why the scatter factor was 1.3 in both cases.
I run new tests with 10.2B as well as with 11.6.
The scatter factors reported by dbanalys were equal the values calculated by the following formula:
ASSIGN vMaxRPB = (vDbBlockSize - 64 - vCreateLimit) / (vMeanRecSize + 2) vMaxRPB = MIN(vMaxRPB, vAreaRPB) vRealRPB = vRecCount / vBlockCount vScatterFactor = 1 + LOG(vMaxRPB / vRealRPB, 10) . /* ASSIGN */
I'm not sure that dbanalys uses exactly same constants to estimate vMaxRPB as in the formula above. But scatter factor is reported with low precision so the real values of these constants don't seem to matter.
Conclusion #1: the scatter factor can be used to estimate the number of blocks that store the table's records resided in type two storage area. Note: an object block stores the number of clusters owned by object. But scatter factor is based on the block count rather than on cluster count.
Conclusion #2: the large scatter factors for the tables in SAT2 seem to mean that the tables have long RM chains. Long RM chain is a "bad thing" and it leads to the performance degradation for the reading. Hence the scatter factors larger than 1.0 need your attention even in SAT2.
Best regards,
George
I had researched a bit father. Progress V11.6.
It looks like dbanalys uses the following formula to calculate a scatter factor:
ASSIGN MaxRPB = (DbBlockSize - 52) / MeanRecSize MaxRPB = MIN(MaxRPB, AreaRPB) RealRPB = RecCount / UsedBlocks ScatterFactor = 1 + LOG(MaxRPB / RealRPB, 10) . /* ASSIGN */
The tests confirmed that a scatter factor does not take into account a create limit (expansion space constant).
I guess dbanalys incorrectly thinks the size of the data block headers is 52 bytes. But a size of standard block header (bkHeaderSize) is 64 bytes or even 80 bytes for the borderline blocks in data cluster. The results per area reported by message 3922 are also affected.
For example:
508 RM block(s) found in the storage area.
1.27% of the RM block space is used.
RM CHAIN ANALYSIS --------------------------- LIST OF RM CHAIN BLOCKS free # free dbkey space slots hold 132096 4008 254 0 132352 4028 255 0 ... 261632 4028 255 0 261888 4012 255 0
The used space = 4096 - 4028 = 68 bytes (for the most blocks) = 64 bytes (bkHeaderSize) + 4 bytes (No of Recs, Free Slots, No Bytes Free Space). Record offset directory is empty in these blocks.
1.27% of 4096 bytes = 52 bytes
So dbanalys also does not count the size of a RM block header (4 bytes + record offset directory).
But it's a minor issue.
A bigger problem is a definition of the blocks used by a table. It can cause the funny results like:
-Record Size (B)- ---Fragments--- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.tbl 1 19.0B 19 19 19 1 1.0 3.7
Dbanalys says that a /single/ record is "scattered" in the are with a huge factor (3.7): it uses 501 times = EXP(10, 2.7) more space than it could. Do we need a dump and load for this record? ;-)
A simple test:
Area: d "Table Area":7,256;512 .
Db block size: 4096
At first I created only one record:
-Record Size (B)- ---Fragments--- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.tbl 1 19.0B 19 19 19 1 1.0 1.0
So far so good.
Then I created 79111 records to fill all 508 blocks in the table's data cluster (the first 4 blocks in the first data cluster are not RM blocks).
Then I deleted all records and again created only one record. And now the scatter factor is 3.7. Table owned all 508 data blocks from beginning. Now dbanalys threats 507 blocks as "used" because each block stores one recid lock.
How many blocks are "used" by the table in my case? One or 508 blocks? Both answers are correct:
The queries that used the indexes will retrieve only one data block from disk - the one that stores the only record in the table.
The queries with TABLE-SCAN option will retrieve all 508 blocks.
The quiz at the end ;-)
What scatter factor table will have if I'll delete its last record?
Scroll right to see the answer:
-> 0.0 for the tabel and 1.0 as average: -> -> -Record Size (B)- ---Fragments--- Scatter -> Table Records Size Min Max Mean Count Factor Factor -> PUB.tbl 0 0.0B 0 0 0 0 0.0 0.0 -> -> ----------------------------------------------------------- -> Subtotals: 0 0.0B 0 0 0 0 0.0 1.0
Best regards,
George