Flushing the Database Buffer Pools (Performance Issues in 20

Posted by dbeavon on 07-May-2016 20:26

Can someone tell me the way to flush my database buffer pools on a Progress database?  Some parts of our code run consistently in 5 seconds during certain times of the day when the database is relatively unused; but later on during a busier time of day we will  notice that the same code runs in 15 seconds whenever it is executed.

Some basic troubleshooting shows that the performance problem is *not* CPU utilization (or at least only 5 seconds of it is CPU, and the rest is probably related to buffers needing to be reloaded from disk.)

After flushing buffers, I should be able to run performance tests and try to isolate disk as the performance bottleneck that adds the extra 10 seconds in a consistent way.

I found this in the KB:

http://knowledgebase.progress.com/articles/Article/How-to-clear-out-database-cache-from-memory-buffers-for-query-testing

However, I still suspect there is a secret ABL program or promon command to clear the database buffers.

 

With all that Progress gives us to monitor our buffers, track hit ratios, and create alternate buffer pools, I find it extremely unlikely that they don't have a way to flush clean buffers from memory.  Could there be something in promon/rnd/debghb that I overlooked?

Posted by George Potemkin on 09-May-2016 08:53

> Has anyone found another such trick that has a similar effect and works in Windows?

Run online dbanalys against a large area with data that are not used in your test.

> I am really quite convinced that there is a secret mechanism for doing this

No, it does not exist. At least because it would be meaningless. File system and disk arrays /lie/ to the processes. It's not possible to be 100% sure that data where reading from disks rather than from cache of these "liars".

> I am already convinced that the problem is disk

Can you share with the community what convinced you in that?

All Replies

Posted by Rob Fitzpatrick on 07-May-2016 20:46

> Can someone tell me the way to flush my database buffer pools on a Progress database?

By "flushing" I assume you mean emptying.  (Flushing typically refers to writing the contents of modified buffers to disk, but leaving the contents in the buffer.)  There is no way to do this as far as I am aware, without restarting the DB.  In this case -B/-B2 is empty but your data may still be cached in OS file system buffers, SAN cache, disk cache, etc.

Have you tried increasing the size of -B?  Do you have enough available RAM to do that?

Posted by Rob Fitzpatrick on 07-May-2016 20:48

What is your OE version?  Are you using -B2?

Posted by George Potemkin on 07-May-2016 22:27

> and the rest is probably related to buffers needing to be reloaded from disk.

IMHO, it's unlikely.

Did you get the promon statistics for both periods of time? Like:
Activity: Summary
Activity: Performance Indicators
Activity: Latch Counts

I'm used to ask our customers to gather the statistics for 30-sec period using my dbmon script and dbstatmon.p program. This information is enough to identify the bottleneck and to find out an application's code that caused the problems.

> However, I still suspect there is a secret ABL program or promon command to clear the database buffers.

It's called a system reboot. ;-)

Posted by ChUIMonster on 08-May-2016 10:01

The secret command is called "proshut", that will cause all dirty buffer to be written to disk.  After the subsequent "proserve" you will have a buffer pool with a minimal footprint of used buffers.

The achieve your goal of consistently obtaining the same disk io profile you will also need to unmount & remount the filesystems.  If the disk subsystem is a SAN you probably need to power it down and then restart it.  And make sure that nobody else is actually using it.

That's probably a lot of work and it may not be worth it.

A simpler approach would be to look at the table stats and the "os reads" field on _tablestat.  If the table in question shows more disks reads during the busy period that's a pretty good sign.  You could do the same with _userio & "_dbread" and the session in question if you need to focus on that specific process rather than the table.

Or you could just download ProTop and let it do all of that for you.   dbappraise.com/protop.html

Posted by dbeavon on 09-May-2016 08:33

It is my goal to determine whether it is likely or unlikely that the 10 second difference is related to going back to disk.  I'd rather not do so much guesswork and/or get the evidence in a second-hand way.  I am already convinced that the problem is disk, but I just need a conclusive and consistent repro.  It shouldn't be so hard to clean the buffer memory and try again without restarting the whole database.***

A few years back someone mentioned that you could trick OE into going back to disk by "by dismounting and remounting the drives that house the database. "  Has anyone found another such trick that has a similar effect and works in Windows?   I'm looking for any ideas at all.  EG ... could I do some type of an online SQL92 change to the database schema for the relevant tables ... in a way that essentially becomes a no-op .... but makes Progress think it needs to reload from disk?  

I am really quite convinced that there is a secret mechanism for doing this  - similar to the "r-code trick" on the ABL code side of things (see KB P130577 where we force OE to reload r-code from disk with CURRENT-LANGUAGE = CURRENT-LANGUAGE).  I'm betting Progress is just keeping it hard-to-find because otherwise someone will go and use it in a production environment.

*** In SQL Server this is accomplished via "DBCC DROPCLEANBUFFERS".  I just need the OpenEdge database equivalent.

Posted by George Potemkin on 09-May-2016 08:53

> Has anyone found another such trick that has a similar effect and works in Windows?

Run online dbanalys against a large area with data that are not used in your test.

> I am really quite convinced that there is a secret mechanism for doing this

No, it does not exist. At least because it would be meaningless. File system and disk arrays /lie/ to the processes. It's not possible to be 100% sure that data where reading from disks rather than from cache of these "liars".

> I am already convinced that the problem is disk

Can you share with the community what convinced you in that?

Posted by ChUIMonster on 09-May-2016 08:55

Of course I could be wrong but I'm pretty sure that you are barking up the wrong tree.

Even if a miracle occurs and a previously unheard of secret feature not named "proshut" is revealed you still have the problem that the OS has also cached reads -- and you still need to flush those buffers.  That's where the unmount/remount advice comes from.  And that isn't going to go over well with a live database trying to keep those filesystems open.

Your goal of determining if it is *likely* can be very easily satisfied by running ProTop.   Simply look at the upper left hand portion of the dashboard (which comes up by default) .  If your hypothesis is valid during "fast" performance you should see a smallish number of "os reads" and during poor performance you will see a much larger number.  If that first order confirmation passes then drill down by looking at the specific tables and sessions (also shown by default) involved.

If, on the other hand, "os reads" are not significantly different then you would just be wasting your time trying to prove it by flushing buffers etc.

Posted by dbeavon on 09-May-2016 09:28

On an evening when I'm the only one in the system, the *first* test does *not* hammer on the CPU at 100% and it *does* take 10 seconds longer (the duration is a full 15 seconds of time, and the CPU jumps around erratically between 50% and 100%.).

After the *first* test, the subsequent tests use the CPU at 100% and take only five seconds.  

Not having a very good way to clean the database buffers, I cannot simulate a "first" test in  a repeatable way.  But I have done my best to rule out every other possibility, and the only thing left seems to be the disk I/O's to the underlying database files.  

I like the idea of running an online dbanalys against a large area with data that are not used in my tests.  I'll report back if that gives me the consistent and reproducible 15 second results (ie a repeatable "first" test).

Posted by George Potemkin on 09-May-2016 09:46

I seemed to mis-read your original post. First run (after db restart?) is longer than the subsequent tests. It's an expected behaviour. ProTop (or other programs) will tell you exactly what tables were read from the disks during the test. But what's your goal? If data are on disk and a program needs these data then it needs to spend some time to retrieve data from disks.

Posted by dbeavon on 09-May-2016 10:11

Right disk is one of my primary resources and will be a bottleneck from time to time.  My goal would be to find a way to add that buffered data into memory on a more permanent basis (larger buffer pool, alternate pool, or whatever).

Or to move the database off the current disk to a better one.

Or "plan C" - use this test or something like it as a baseline to determine if our disk resource is possibly misconfigured.   In the SQL Server world we have a tool called "sqlio" that is used ( independently of SQL Server itself ) to run performance tests against the underlying disk (various block sizes, queue lengths, read vs write preferences, random vs sequential, and so on).  Not sure if Progress ships such a thing (eg. for HP-UX) to measure the performance characteristics of the underlying disk that it sits on.

Before moving forward on any of these, we would need evidence that points conclusively to a 10 second disk bottleneck.  A repeatable test that involves re-reading from disk would be the most straight-forward evidence.

Posted by George Potemkin on 09-May-2016 10:31

> Not sure if Progress ships such a thing (eg. for HP-UX) to measure the performance characteristics of the underlying disk that it sits on.

Run dbanalys (or better ixanalys) for your database and check how long it will take. You can easy turn on the timing for each area as well. You will need to modify a promsgs file a bit.

BTW, Progress has the old 'proutil void.db -C iotest' command but it's not fully implemented.

> we would need evidence that points conclusively to a 10 second disk bottleneck.

Use the tools like ProTop.

> In the SQL Server world we have a tool called "sqlio" that is used ( independently of SQL Server itself ) to run performance tests against the underlying disk (various block sizes, queue lengths, read vs write preferences, random vs sequential, and so on).

Why OS or disk arrays do not provide such tools? Maybe they know something about reliability of their results? ;-)

> My goal would be to find a way to add that buffered data into memory on a more permanent basis (larger buffer pool, alternate pool, or whatever).

To "fix" the issue with a first run that takes 10 sec longer than the subsequent runs? 

Larger or alternate buffer pool will not help to retrieve data from disk. Data are retrieved from disk by OS rather than by Progress. Progress only copies data from the filesystem cache to a database buffer pool. You did not seem to have a perofrmance issue when data used by your tests are already in database memory.

Posted by martinz on 09-May-2016 12:58

Protop really _is_ a great tool to quickly check for excessive IO in my experience. It has helped us identify many bottlenecks in the application, which often could be solved by rewriting queries and sometimes bij adding (or replacing) indexes.

George, is there any documentation for the 'proutil void.db -C iotest' command? At first glance, it may be a useful tool, but I'd like it to perform some more requests before determining the io performance.

Posted by George Potemkin on 09-May-2016 13:09

Martinz, as I already said the iotest command is not fully implemented and it can't be used for IO benchmarking. I mentioned it just to point out that many years ago Progress developers thought about writing such tool.

Create the void database, run proutil void -C iotest and check db log.

Posted by dbeavon on 16-May-2016 14:43

Thanks again for the tip on using "proutil -C dbanalys".  It is definitely helping to isolate a disk bottleneck.  

I run the following (*) for about 5 mins (after picking the right area first).  I'm sure this 5 mins only works for me based on the throughput of my disk and the size of the database buffers (-B) :

* proutil mydatabase -C dbanalys area myarea > out.txt

The *first* time I execute my ABL code after doing this, the behavior of the code is totally different than *subsequent* executions.  The most obvious difference is that CPU runs at only 50% on the *first* execution and the code takes 10 seconds longer.  On subsequent executions the CPU runs at the full 100%.

I still wish Progress would just tell us what the secret command is to empty the clean buffers without having to wait for a full five minutes to pull unnecessary data off of the disk.  (There's no way there isn't a secret command for this.)

Thanks again,

David

This thread is closed