Sizing -B2

Posted by Rob Fitzpatrick on 03-Jul-2015 10:32

RHEL 5.6, OE 10.2B07 64-bit

I am having difficulty with tuning -B2 in a customer database.  In the course of investigating and addressing performance issues, I took a maintenance window to make some startup param changes, do some periodic maintenance (dbtool fix SQL width, update SQL stats), and assign objects to the ABP.  In total I assigned 17 tables and their 42 indexes to the ABP (at the object level), along with the schema area (via proutil enableb2).  All tables are in areas with appropriate RPB.

I did my initial sizing exercise mostly with dbanalys data.  Indexes are straightforward; I took block counts from the report.  For tables, I took record data size from the report.  Knowing that this isn't an accurate measure of the size of the blocks that hold these records, I doubled the reported size to give a margin for error.  And for the schema area, I simply took the size of dbname.d1 and divided by DB block size.  I set -B2 to 10,888.

I restarted the DB and its batch clients and did some other maintenance work, like the SQL stats.  I noticed, while in promon R&D 6 11, that there was heavy LRU2 latch activity.  Sure enough, R&D 2 3 showed LRU2 replacement policy enabled.  I thought I must have sized -B2 too small.  So with proutil increaseto I bumped up -B2 from 10,888 to 20,000, thinking that that was *definitely* enough, and I used promon to disable the LRU2 mechanism and zero out the latch counts.  A couple of hours later, the LRU2 mechanism was enabled again, so I doubled -B2 again to 40,000 and manually disabled LRU2 again and zeroed out the latch counts.  Since then, even at 40,000, I have again seen the LRU2 mechanism enabled several times.

I can't understand why this is happening.  Sure, I could be off somewhat on my rough estimate of space needed for RM blocks.  But I didn't think I could be off by a factor of 8.

I decided I needed a better way of calculating table size in blocks.  (As an aside, I would *really* like to have dbanalys/tabanalys report table sizes in blocks, for Type II areas.)  I thought of two ways to measure blocks needed for tables.  One is to do a FOR EACH on each ABP-assigned table and count the unique dbkeys.  The second is to take a backup of the DB, restore it to a new location, start that DB with a large -B2 and do FOR EACH on each of those tables and look at the ABP OS reads.

The first approach (counting dbkeys) gave me a grand total of 3,964 blocks for all 17 tables.  From dbanalys, I have 2,207 blocks of index data.  And based on the 17 MB size of my schema area, if is 2096 blocks in size, for a total of 8,267 blocks.  The second method (restore backup, set large -B2, look at OS reads) gave me a similar measure of 4,237 blocks for table data, meaning a total of 8,540 for -B2.  That's a larger discrepancy than I can explain, but in either case the original -B2 setting of 10,888 should have been enough, let alone 20,000 or 40,000.  For the latter method, I did run viewb2 reports against both DBs and diff them to ensure they had the same ABP assignments.

These tables aren't absolutely static.  There is some CUD activity on some of them, but even the highest numbers of creates are low; about 20-30 records created per day, with record sizes of about 50-100 bytes.  All of the assigned indexes belong to assigned tables; there are no indexes of large, fast-growing tables in the ABP.

So I'm open to suggestions.  Am I missing something obvious?  Is there a known ABP bug in 10.2B07?  Also, for those who may question the version, I'm planning to install SP08 in the short term (a month or so), and the client has begun an upgrade project, to move to our latest application version on 11.4.  That should go live in about October.

All Replies

Posted by TheMadDBA on 03-Jul-2015 11:54

Hard to say without seeing all of the information myself but here a few things to consider...

- Are there any LOB columns for those tables?

- If you make the B2 huge on the restored copy and run a dbanalys how many blocks show up used in B2?

What do the promon screens show for the alternate buffer pool?

Posted by Rob Fitzpatrick on 03-Jul-2015 12:24

Thanks for the input.

> - Are there any LOB columns for those tables?

I should have said; there are no LOBs.

> - If you make the B2 huge on the restored copy and run a dbanalys how many blocks show up used in B2?

That's more or less what I did, except that I used ABL instead of dbanalysis to do the reads.  I'll bounce the test DB and run dbanalys on it and report the results.

> What do the promon screens show for the alternate buffer pool?

Over about the last two days I'm showing about 1.2 billion logical reads, about 7800/second, in the ABP.  In that time there are 32 OS reads in ABP.  So that suggests I'm very close to having a large enough -B2.  As I write this, LRU2 is enabled again.  I will bump up -B2 by another 1,000 blocks and see whether LRU2 replacement remains disabled after that.

Posted by Rob Fitzpatrick on 03-Jul-2015 13:29

I started my test DB with -B2 100,000.  It's a copy of prod.

After proserve, I have 75 ABP OS reads.

After running dbanalys I have 18,455 ABP OS reads.

After running a table scan on each table assigned to ABP (not including schema area), I have 18,516 ABP OS reads.

After two consecutive runs of a program that scan each assigned application table with each assigned index, I still have 18,516 ABP OS reads.

After another dbanalys, still 18,516.

I ran proutil viewb2 to confirm that all assignments in the test DB match the prod DB; they do.

So:

- I have no idea why this new total is so much higher than my earlier calculation;

- I have no idea why, despite this total being much smaller than my current prod -B2 setting of 40,000, I'm still getting LRU2 latching in prod.

Posted by ChUIMonster on 03-Jul-2015 13:47
If you do not use -Bp 100 (or some small value) when doing an online
backup LRU2 will get enabled.

You might be running into that or perhaps some similar utility program
is having a similar unexpected side effect.

--
Tom Bascom
603 547 9043 (office)
603 396 4886 (cell)
tom@wss.com
Posted by Rob Fitzpatrick on 03-Jul-2015 13:54

> If you do not use -Bp 100 (or some small value) when doing an online backup LRU2 will get enabled.

I believe you're referring to defect OE00208701; it was fixed in SP05.  And my backup script uses -Bp 10 for online backups.

So there is something else going on.  I agree that it could have been caused by some other utility, like dbtool.  

Posted by George Potemkin on 03-Jul-2015 13:55

Rob,

Did you check the contribution of each individual table and index using _TableStat-OsRead/_IndexStat-OsRead?

BTW, dbanalys does not make friends with _TableStat/_IndexStat. Use 4GL queries for tests.

Posted by Rob Fitzpatrick on 03-Jul-2015 14:18

> Did you check the contribution of each individual table and index using _TableStat-OsRead/_IndexStat-OsRead?

I did check _TableStat-OsRead, in prod.  The numbers didn't make sense to me.  Each of the 17 tables showed far more OS reads than my recid calculations showed should be necessary.  

My calculations show that my ABP-assigned tables vary in size from 1 block to 1,560 blocks.  The OS reads varied from 993 blocks (for a table I calculate to have 74 blocks) to 6,352 blocks (for a table I calculate to have 1,311 blocks).  Suspiciously, six of the tables had identical values for OS reads: 1,016 blocks; a seventh had 1,017 OS reads.  My calculations show these seven tables vary in size from 2 to 258 blocks.

On the one hand, I would expect OS reads to be higher than table sizes if -B2 was actually too small, as evicted blocks could be re-read from disk several times.  And that would be consistent with the heavy LRU2 latch activity I have seen.  

Based on the current assignments, roughly half the logical I/O in the DB is now in the ABP.  The piece I am missing is: if the size of -B2 is now 40,000, as it has been for a week and a half now, why have any evictions been required in recent days?  The only thing that would make sense is if the proutil increaseto didn't actually increase the size of -B2, but reported that it did.  It will be interesting to see if this problem persists after a DB restart.

I should also mention that when I ran increaseto I also changed the startup script for the DB to have the new value of 40,000 for -B2, and the DB hasn't been restarted since I did maintenance on 06/23, and promon R&D 1 12 shows 40,000 for -B2.  Also, in case it matters, I haven't yet had evictions from -B.  It is set to 3,155,360 and I currently have 29,112 empty buffers in total.

Posted by Rob Fitzpatrick on 03-Jul-2015 14:26

> BTW, dbanalys does not make friends with _TableStat/_IndexStat. Use 4GL queries for tests.

Sorry George, I'm not sure I follow; can you elaborate please?  Are you referring to the fact that dbanalys uses C code rather than the 4GL query engine to gather its stats?  

Also, is it safe to assume that dbanalys reads each DB block no more than once?  If not then my test result is invalid.

Posted by TheMadDBA on 03-Jul-2015 14:40

The reason I suggested dbanalys is that it usually faster than reading the tables and it will read all of the table and index blocks (where a for each will just use one index per table).

I guess I should have been clearer before - what I was wondering about is how many blocks used are showing up in the B2 compared to the startup parameter.  If you are running out of blocks that is one issue... if you have free blocks and LRU2 keeps coming back then it is a different issue.

OS reads can be spotty and SQL doesn't update the VSTs properly for Type II areas until you get to 10.2B08 or 11.4 (possibly some SP of 11.3).

What does dbanalys show for fragments on your tables? You can use RECID to find the parent block but that doesn't mean that your record isn't stored in more than one block.

Posted by Rob Fitzpatrick on 03-Jul-2015 14:56

TheMadDBA:

You make a good point about fragmentation; I negected to consider that with my dbkey-counting approach.  And that could account for the discrepancy between the results I got from that and from dbanalysis.

Of my 17 assigned tables, two have fragmentation.  One has 6 extra fragments and the other has 606.

Regarding blocks used, are you referring to _BuffStatus._BfStatus-UsedBuffs?  It is 3,166,250.  As far as I am aware there isn't a separate promon/VST field for used buffers in -B versus -B2.

Posted by George Potemkin on 03-Jul-2015 15:24

Rob,

I meant that dbanalys does not update _TableStat-read/_IndexStat-read. It does not read the index keys (but it reads the index blocks). That is why it does not update _IndexStat-read. It does read all fragments of each records but dbanalys does it in a way that differs how 4GL does. For example, dbanalys can read the records with sizes that exceed 32K. It does not consolidate the fragments in one record. I guess it's a reason why it does not update _TableStat-read. I did not check if dbanalys updates _TableStat-OsRead/_IndexStat-OsRead or not. BTW, idxfix (with one exception) also does not update _TableStat/_IndexStat. I think it's just a bug.

Posted by ChUIMonster on 03-Jul-2015 15:26
PROMON, R&D, 1, 7

Total buffers: 10202
Hash table size: 3407
Used buffers: 37
Empty buffers: 10165
On lru chain: 10001
On lru2 chain: 200

If "used buffers" is less than "lru chain" plus "lru2 chain" or "empty
buffers" > 0 then you should have blocks leftover in lru2 assuming that
"normal" stuff uses all of the "lru1" blocks. (Which is usually a valid
assumption.)

--
Tom Bascom
603 547 9043 (office)
603 396 4886 (cell)
tom@wss.com
Posted by Rob Fitzpatrick on 03-Jul-2015 15:31

Thanks for the explanation George.  I think at this point my best path forward is to installed SP08, increase -B2 in the prod DB yet again, bounce the DB, and monitor ABP OS reads and individual table and index OS reads from application activity.  I'll hold off on running utilities like idxcompact etc. until I am sure I have -B2 sized appropriately.

Posted by Rob Fitzpatrick on 03-Jul-2015 15:37

Tom:

Total buffers:          3195362
Hash table size:        1037347
Used buffers:           3166250
Empty buffers:            29112
On lru chain:           3155361
On lru2 chain:            40000
On apw queue:                 0
On ckp queue:                 0
Modified buffers:           830
Marked for ckp:               0
Last checkpoint number:     153
LRU force skips:            100
LRU2 force skips:           100

-B = 3,155,360

-B2 = 40,000 (was 10,888 at DB startup; was increased online to 40,000)

Used buffers = 3,166,250
LRU + LRU2 = 3,195,361

So used buffers < (LRU + LRU2).  And empty buffers > 0.

Posted by TheMadDBA on 03-Jul-2015 16:21

Interesting... your lru2 chain - empty buffers = 10,888.. your exact startup option. Maybe increaseto is broken for B2?

Posted by Rob Fitzpatrick on 04-Jul-2015 16:03

I think more than that is broken, at least in late 10.2B.

My test (run in both SP07 and SP08):

  • create sports DB
  • move all application objects to Type II areas
  • assign all application objects to ABP
  • proserve sp -B2 10
  • run FOR EACH on each table with each index (except word indexes)

Results:

07/04/15        Activity: Buffer Cache
<snip>
Alternate Buffer Pool
                                    Total 
Logical reads                        7544 
Logical writes                          0 
O/S reads                             707 
O/S writes                              0 
Marked to checkpoint                    0 
Flushed at checkpoint                   0 
Writes deferred                         0 
LRU2 skips                              0 
LRU2 writes                             0 
APW enqueues                            0 
Alternate buffer pool hit ratio:  90 %
LRU2 replacement policy disabled.

07/04/15        Activity: Latch Counts
16:44:28        07/04/15 13:01 to 07/04/15 16:43 (3 hrs 42 min)

                ----- Locks -----
   Owner        Total        /Sec
<snip>
LRU  --          1539           0
LRU  --           709           0
<snip>

07/04/15        Status: Buffer Cache
16:43:46

Total buffers:                     3012
Hash table size:                   887
Used buffers:                      80
Empty buffers:                     2932
On lru chain:                      3000
On lru2 chain:                     11
On apw queue:                      0
On ckp queue:                      0
Modified buffers:                  0
Marked for ckp:                    0
Last checkpoint number:            0
LRU force skips:                   0
LRU2 force skips:                  0

07/04/15        OpenEdge Release 10 Monitor (R&D)
16:46:03        Adjust Latch Options

                1. Spins before timeout:       96000
                2. Enable latch activity data collection
                3. Enable latch timing data collection
                4. Initial latch sleep time:   10 milliseconds
                5. Maximum latch sleep time:   250 milliseconds
                6. Record Free Chain Search Depth Factor: 5
                7. Enable LRU2 alternate buffer pool replacement policy
                8. Adjust LRU force skips:           0
                9. Adjust LRU2 force skips:          0

Conclusion: clearly, with an ABP size of 10 blocks, reading almost the whole sports DB into the ABP results in the activation of the LRU2 replacement mechanism.  The latch counts confirm this.  But the alerts that should tell the DBA that this problem exists do not work:

  • R&D 2 3 shows "LRU2 replacement policy disabled.".
  • R&D 4 4 7 shows "Enable LRU2 alternate buffer pool replacement policy".
  • db.lg does not show message "(-----) LRU on alternate buffer pool now established.".

I was hoping to see that the LRU2 was enabled, so I could go on to recreate my customer's issue and see if it's a problem with proutil increaseto.

When I get a chance I'll test this again in v11.

Posted by Rob Fitzpatrick on 04-Jul-2015 16:05

Community note: although the rich text editor lets you put colour highlighting on plain text added with the code editor, and it looks fine in the message preview, the final message shows HTML tags instead of colour.  :(

Posted by TheMadDBA on 04-Jul-2015 16:22

This kb seems to say that lru2 is broken in a number of ways. Basically the lru2 should be enabled but isn't.

knowledgebase.progress.com/.../000045605

Posted by Rob Fitzpatrick on 04-Jul-2015 16:38

> This kb seems to say that lru2 is broken in a number of ways. Basically the lru2 should be enabled but isn't.

Crap.  Thanks TMD.  I really don't want to assign objects to ABP at the area level.  And it looks like the existing situation (FIFO replacement) has made things worse than having the reads done in the primary buffer pool.

Based on the the article you linked to it looks like I'll be okay in 11.3.2 and later; I'll test that.  In the meantime, I'll bump up -B2 again, bounce the DB, and monitor LRU2.  Thanks for the help.

Posted by TheMadDBA on 04-Jul-2015 16:49

No problem. I am kind of shocked that none of us have run into this before.

I know I have turned on B2 on quite a few 10.2B databases. And I bet Tom has done even more. I suspect that once  you get it set large enough life will be good. The tests you did reading by every index plus 20% wiggle room should be fine.

Good luck

Posted by Rob Fitzpatrick on 05-Jul-2015 17:45

It looks like life is better in 11.3.2.  My preliminary test results:

test:
  11.3.2
  sports DB, all application objects assigned to ABP at object level
  do more I/O on ABP than value of -B2
expected result:
  LRU2 policy will be enabled
  there will be LRU2 latch activity and more ABP OS reads than value of -B2
steps:
  convert sports db to v11
  proserve sports -B2 10
  run program to read all assigned tables with all assigned indexes
result:
  690 OS reads on ABP
  r&d 6 11 shows lots of LRU2 activity
  r&d 2 3 shows LRU2 policy enabled
  r&d 4 4 7 shows "disable"
conclusion:
  expected result for 11.3.2
  
test:
  resize -B2 online to be large enough to hold all assign objects
  read all ABP-assigned objects
expected result:
  LRU2 policy remains disabled
  no LRU2 activity
steps:
  disable LRU2 policy
  proutil increaseto -B2 700
  zero stats in promon
  run program to read all assigned tables with all assigned indexes
result:
  r&d 2 3 shows 75 ABP OS reads, LRU2 policy enabled
  r&d 6 11 shows 7367 LRU2 latch locks
  r&d 4 4 7 shows "disable"
conclusion:
  -B2 was not sized large enough?
  there were 75 ABP OS reads, so at most -B2 was 75 blocks too small

test:
  resize -B2 online again, increasing it to 800 (100 larger than before)
  read all ABP-assigned objects
expected result:
  LRU2 policy remains disabled
  no LRU2 latch activity
steps:
  disable LRU2 policy
  proutil increaseto -B2 800
  zero stats in promon
  run program to read all assigned tables with all assigned indexes
result:
  r&d 6 11 shows 0 LRU2 latch locks
  r&d 4 4 7 shows "enable"
  r&d 2 3 shows 7367 logical reads, 0 OS reads in ABP, LRU2 policy disabled
conclusion:
  expected result in 11.3.2


Posted by Rob Fitzpatrick on 08-Jul-2015 15:55

Just an update on my customer.  Last night I bumped up B2 marginally and bounced the DB.  Thirteen hours of application activity later, LRU2 policy remains disabled, almost zero LRU2 latch activity, very low latch and resource timeouts and OS reads.  So far, life is good.  :)

Posted by TheMadDBA on 08-Jul-2015 16:11

Good to hear :-) I am sure everything will be fine from here on out as long as the B2 size stays ahead of the growth for those tables/indexes.

I strongly suspect that the increaseto for B2 is partially broken in 10.2B.  Like it increases some of the internal counters ( like the blocks allocated) but didn't actually let processes use the extra B2 blocks.

It is really the only thing that makes sense from what you described. Not sure if opening a support call will make any difference other than getting a KB entry. It does seem to work in recent versions of V11.

Posted by Rob Fitzpatrick on 08-Jul-2015 16:17

For this client, they're upgrading soon to 11.  If it were earlier in 10.2B's life cycle I'd open a case.  I think the various KB articles I've seen, along with the one you pointed out, describe my circumstances.

This thread is closed