How to use statement caching in production environment?

Posted by George Potemkin on 04-Aug-2016 09:57

Statement Cache is a very useful feature but it's also a rather dangerous thing.

Some issues were discussed by Gus Bjorklund at Exchange 2011
OE1111: What's That User Doing? The Progress OpenEdge Client Request Statement Cache
download.psdn.com/.../OE1111.wmv
community.progress.com/.../2255.what-s-going-on-database-client-monitoring

Here is a list of issues:
1. The scratch files (*.cst) are created by default in the user's working directory (or $HOME?). To read the _Connect records the other sessions should have permissions to read the files in the directory of a user
with enabled statement cache. Obviously a security department demands the opposite.
Defect OE00237502 / PSC00258643
knowledgebase.progress.com/.../000040544
The workaround is use promon/R&D/1/18/8. Specify Directory for Statement Cache Files

2. Enabling statement cache will dynamically allocate the space in shared memory: 288 bytes per session. Progress will use the -Mxs memory. But we can check the _Segment-ByteFree for last segment.

3. Enabling statement cache adds approximately 25% overhead for remote Clients.

4. Enabling statement cache creates an additional activity on the USR latch: every update of statement cache will generate 2 USR latch locks. And it's a main performance killer for the self-service connections.

Example:
Statement cache was enabled only for 14 most active users (of 1200 running users).

08/02/16        Activity: Latch Counts
11:20:50        08/02/16 11:20 to 08/02/16 11:20 (4 sec)
                ----- Locks -----    Naps
   Owner        Total        /Sec    /Sec
USR  --        397706       99426       3

14 sessions have created 99,426 USR latch locks per sec.

The naps are really low but USR is a queued latch - the main waits are on semaphores:

08/02/16        Activity: Other
11:20:50        08/02/16 11:20 to 08/02/16 11:20 (4 sec)
                      Total   Per Sec   Per Tx
Commit                 1092    273.00     1.00
Wait on semaphore      1060    265.00     0.97
Non-blocking waits        0      0.00     0.00
Semaphore latch waits  1043    260.75     0.96

08/02/16        Activity: Performance Indicators
11:20:50        08/02/16 11:20 to 08/02/16 11:20 (4 sec)
                      Total   Per Sec   Per Tx
Commits                1092    273.00     1.00
Total waits              17      4.25     0.02
Lock waits                0      0.00     0.00
Resource waits           17      4.25     0.02
Latch timeouts          314     78.50     0.29


(Wait on semaphore + Non-blocking waits + Semaphore latch waits) = (Total waits) on the Activity: Performance Indicators screen ??? Not in Progress V11.6.1 on AIX.

So after enabling statement cache the most active sessions were waiting on semaphores. And it could be much worse if we would activate the caching for all users.

5. "Radioactive contamination" of the usrctl slots by USR latches. After disabling statement cache the reading of the corresponding _Connect record will always creates 2 USR latch locks - it does not matter if the old session still exists or if the slot in the usrctl table was already re-used by new session or if it's not currently in use. It's a minor issue provided there are no sessions that are constantly reading the _Connect records.


We can easy use the statement cache to check what the sessions are doing /right now/ - a few short sampling intervals (1 sec or so) would be enough. But taking into account the above issues it would be a bad idea to activate the statement cache for a long period of time. What would be a best strategy to monitor the statement cache during /long/ period of time (for example, during 24 hours)? Let's say a session creates a high db access during last 5 min interval. Should we trigger the short interval monitoring to check what programs are running "right now" by enabling "One Time" (type 3) statement caching? Only once per 5 min interval or a many times using one sec intervals? How to gather the statistics about long running Progress sessions with the least possible impact on production environment?


Best regards,
George

All Replies

Posted by ChUIMonster on 04-Aug-2016 10:15

I agree.  It needs work.  It's a shame that it doesn't get more attention -- this is one of the best and most powerful diagnostic tools available for an OE db.

None the less -- I do find that enabling "single" (just the top of the stack rather than the whole stack) for a focused period is relatively safe.  Especially if you first direct the temp files to -T (or /tmp or some similar place with wide open perms).

I'd really like for "single" to be enabled by default.  Of course the issues above need to be addressed before that would be a good idea.

(edited to change incorrect "one time" to "single")

Posted by George Potemkin on 04-Aug-2016 10:34

Would it be useful to get a series of "one time"? Call stacks are the fast changing things.

Posted by Keith Sudbury on 04-Aug-2016 11:53

Steps on soapbox...

This is probably the thing I miss the most when switching between OE and other databases (SQL Server, PostgreSQL, Oracle, etc). The ability to see exactly what is going on in a database either right now or back in history.

Making the statement caching safe and low impact would be a great start but we have a long way to go to catch up.

Right now I am trying to track down a problem with excessive reads on a commonly used table. If it was a longer running query I could try and get a stack trace and find the problem code (or use statement caching). But this is a small enough table so reading the entire table only takes a few ms. Really there isn't an easy/low impact way in OE to find out what is reading the extra records.

For other databases I just look at the query execution history (and/or turn on extended tracing for a certain table) and find the problem code pretty quickly.

/end soapbox

Posted by Richard Banville on 04-Aug-2016 12:00

With the exception of #3, the rest of these are easily solved.

For #4, a latch is required but I guess it could be a different latch as opposed to the USR latch.

Are there other problems you've run into?

Posted by Thomas Mercer-Hursh on 04-Aug-2016 12:07

Keith, ABL2DB would point you directly to all of the places that table was accessed, making the analysis fairly straightforward unless it is accessed thousands of places and only badly in one.

Posted by Richard Banville on 04-Aug-2016 12:09

I agree.  But realize, the OpenEdge architecture is different than other databases.  Currently, the ABL query tree is created client side and requests for data are made of the db server by the client sometimes by passing query requests but not in the form of the original "query statement". The db server side therefore does not have access to the original query.  

Statement caching was an attempt to help with this but then again, you only get ABL source code references, not the query statement (unless the request came from OE SQL).

Running with the ABL client logging will provide the information but this is client side.  I realize that from the DBA perspective, access to the client side may not exist.

It is an area we need to improve on.  Keep the suggestions coming...

Posted by ChUIMonster on 04-Aug-2016 12:13

Sorry -- I was inadequately caffeinated -- when I said "one time" I *meant* "single" (as in "just the top of the stack").

None the less -- a lot of times the "bad actor" is doing the same thing over and over and over.  A series of "one time" results would often reveal that pattern.

Posted by Keith Sudbury on 04-Aug-2016 12:20

That is kind of the key issue.. finding which programs access the table isn't that big of a deal. Finding out which programs are actually the issue is the main problem.

The recurring suggestion to enhance the XREF to show improper index usage would also go a long way to resolving these issues.

Posted by Rob Fitzpatrick on 04-Aug-2016 12:29

Several people have cautioned about the danger of using option 2 (full call stack) with CSC.  So that could be added to the issues list.  

I haven't encountered this personally but I'm sure Tom or George could elaborate.

Posted by Keith Sudbury on 04-Aug-2016 12:30

Making the statement caching performance friendly would go a long a way Rich. Being able to just turn it on and leave it (like Tablestats/Indexstats) would be nice because you never know when you are going to need access to that level of detail.

Having an option to log access to certain tables/indexes would also be very helpful. Possibly when the _connect is updated it could write to a defined log file.

Not trying to make OE into Oracle today :-) just a bit of a rant because of this hard to find issue

Posted by George Potemkin on 04-Aug-2016 12:53

Richard,

> For #4, a latch is required but I guess it could be a different latch as opposed to the USR latch.

Spin latch instead of a queued latch?

My question was not about optimization of the current statement cache mechanism.

My point is that enabling type 2 statement caching is like keeping the light and air conditioning on all the time in a room that you are going to visit only once per week. It's just not efficient use of the resources (no matter expensive or not). "One Time" (type 3) statement caching would be better.

Tom,

> None the less -- a lot of times the "bad actor" is doing the same thing over and over and over.

I guess it depends from application. I saw the examples when a few snapshots were needed to catch the "bad actor".

This thread is closed