Promon, its options and the resources it locks or waits

Posted by George Potemkin on 23-Jul-2015 06:18

If you're using a script to gather information from promon then during performance degradation you can see that some screens might hang for a few seconds (or even the minutes in rare cases): the timestamp in the top left corner of the screen will differ from one from the previous screen(s) for the same snapshot. For example, if MTX is a bottleneck then the following two screens are often hanging for a while:
Activity: BI Log
Activity: I/O Operations by File

In other cases I saw the hangs of the screens:
Activity: Summary
Status: Buffer Lock Queue

Obviously promon was waiting for some resources.
Promon is able to monitor the resource usage created by different database processes including promon itself. The information below is based on my tests as well as on the inside information provided by Progress developers and by PSTS (I hope they will not mind if I'll share their information). I don't know all details about promon and I can misinterpret my test's results. So take the information below with a grain of salt.

Promon's screens can be divided into three categories:
1. Status screens. Each time you enter a status screen promon reads the correspondent information either from shared memory or from databases blocks. I did not spend much time to investigate these screens because I did not see any delays in production environment;
2. Activity screens. Each time you enter an activity screen promon reads information from its own /private/ memory. Promon copies information from database shared memory to its own private memory at startup or when you ask promon to update the activity screens.
3. Mix screens. They can be titled as "activity" screens but in in fact they contain the status data as well. Example is the "Activity: Summary" screen that also reports the current length of the chains.

What promon does when it updates the activity screens depends from the -NL option that can be used with promon.
Using -NL (No-Lock) with promon utility
http://knowledgebase.progress.com/articles/Article/P123689

While updating the activity screens promon locks the TXQ latch (Transaction End Lock Queue) that is also used for transaction activity. But promon with -NL does NOT lock TXQ latch.

While updating the activity screens promon reads "Area Control Object" (ACO) object blocks - one block per each data area including "Control Area" and "Schema Area". These blocks store the various block counts. Each block read will lock BHT+LRU+2*BUF latches.

Promon with -NL still /tries/ to read these blocks but it will use a "try lock" mechanism: promon will continue on to the next area without reporting statistics for the area when its object block is locked.

In my opinion promon should be always run with -NL option.

Promon also can be started with -F option. AFAIK, promon -F will not try to get a login semaphore as well as it will not lock USR latch while registering itself in User Control Table (except in V10.2B-10.2B05 - I guess it was a bug). Note that promon -F does use the USR latch during logout. /Never/ use the -F option unless you 100% sure that the normal connection will hang (because either login semaphore or USR latch is locked by someone else). Otherwise you can crash your database.

As an exercise I has checked the activity generated by "Status: BI Log" screen. It expectedly locks the BIB latch and unexpectedly it reads all records from the _AreaExtent [-72] table and as result it locks n * (BHT+LRU+2*BUF) latches.

So my tests did not answer my original question: why promon -NL may hang for a few seconds on some Activity screens? Wild guess: these screens were waiting for the mysterious object latches. I know it's possible to see an owner of object latch using the "Restricted Options" in promon. Unfortunately I don't know how to create a test that would reproduce the contention for the object latches and I don't know if it's safe to use the "Restricted Options" in production environment. Nevertheless I hope that my little investigation can be useful for someone else.

Regards,
George

All Replies

Posted by Richard Banville on 23-Jul-2015 06:58

Interesting post.
 
I’m not sure what you mean by “object latches”.  I assume this is a buffer lock on a particular object block.
 
Suggesting that promon should always be run  with the –NL script should carry the caveat that without lathing data structures, it is possible that promon could crash while traversing non-latched data structures – but it should not crash the DB along with it.  Such a crash should not be reported as a bug.
 
Your statement on use of the –F is spot on.
 
 
[collapse]
From: George Potemkin [mailto:bounce-GeorgeP12@community.progress.com]
Sent: Thursday, July 23, 2015 7:19 AM
To: TU.OE.RDBMS@community.progress.com
Subject: [Technical Users - OE RDBMS] Promon, its options and the resources it locks or waits
 
Thread created by George Potemkin

If you're using a script to gather information from promon then during performance degradation you can see that some screens might hang for a few seconds (or even the minutes in rare cases): the timestamp in the top left corner of the screen will differ from one from the previous screen(s) for the same snapshot. For example, if MTX is a bottleneck then the following two screens are often hanging for a while:
Activity: BI Log
Activity: I/O Operations by File

In other cases I saw the hangs of the screens:
Activity: Summary
Status: Buffer Lock Queue

Obviously promon was waiting for some resources.
Promon is able to monitor the resource usage created by different database processes including promon itself. The information below is based on my tests as well as on the inside information provided by Progress developers and by PSTS (I hope they will not mind if I'll share their information). I don't know all details about promon and I can misinterpret my test's result. So take the information below with a grain of salt.

Promon's screens can be divided into three categories:
1. Status screens. Each time you enter a status screen promon reads the correspondent information either from shared memory or from databases blocks. I did not spend much time to investigate these screens because I did not see any delays in production environment;
2. Activity screens. Each time you enter an activity screen promon reads information from its own /private/ memory. Promon copies information from database shared memory to its own private memory at startup or when you ask promon to update the activity screens.
3. Mix screens. They can be titled as "activity" screens but in in fact they contain the status data as well. Example is the "Activity: Summary" screen that also reports the current length of the chains.

What promon does when it updates the activity screens depends from the -NL option that can be used with promon.
Using -NL (No-Lock) with promon utility
http://knowledgebase.progress.com/articles/Article/P123689

While updating the activity screens promon locks the TXQ latch (Transaction End Lock Queue) that is also used for transaction activity. But promon with -NL does NOT lock TXQ latch.

While updating the activity screens promon reads "Area Control Object" (ACO) object blocks - one block per each data area including "Control Area" and "Schema Area". These blocks store the various block counts. Each block read will lock BHT+LRU+2*BUF latches.

Promon with -NL still /tries/ to read these blocks but it will use a "try lock" mechanism: promon will continue on to the next area without reporting statistics for the area when its object block is locked.

In my opinion promon should be always run with -NL option.

Promon also can be started with -F option. AFAIK, promon -F will not try to get a login semaphore as well as it will not lock USR latch while registering itself in User Control Table (except in V10.2B-10.2B05 - I guess it was a bug). Note that promon -F does use the USR latch during logout. /Never/ use the -F option unless you 100% sure that the normal connection will hang (because either login semaphore or USR latch is locked by someone else). Otherwise you can crash your database.

As an exercise I has checked the activity generated by "Status: BI Log" screen. It expectedly locks the BIB latch and unexpectedly it reads all records from the _AreaExtent [-72] table and as result it locks n * (BHT+LRU+2*BUF) latches.

So my tests did not answer my original questions - why promon -NL may hang for a few seconds on some Activity screens. Wild guess: these screens were waiting for the mysterious object latches. I know it's possible to see an owner of object latch using the "Restricted Options" in promon. Unfortunately I don't know how to create a test that would reproduce the contention for the object latches and I don't know if it's safe to use the "Restricted Options" in production environment. Nevertheless I hope that my little investigation can be useful for someone else.

Regards,
George

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by George Potemkin on 23-Jul-2015 07:18

Hello Richard,

> I’m not sure what you mean by “object latches”.

promon/R&D/4. Adjust Latch Options/6. Restricted Options

       BUFFERS

       ObjLatch.holder: %i

          .objLock: %i

         .latch: %i

           .lockCnt: %i

Or the message like:

latObjLatchFree: owner %d of object latch %i is not me (lockCnt: %i, addr: %x) latch stack: %d

Or kbase article: "Database activity drops suddenly"

http://knowledgebase.progress.com/articles/Article/000035040

"Upgrade to OpenEdge 10.2B08, 11.3.0.0 or later. Where the problem of the process not in service when executing a critical session on the object latches has been addressed."

Posted by Richard Banville on 23-Jul-2015 08:38

Oh, those object latches ;)

Without documenting the restricted options, after all they are restricted, this restricted menu prints out information about the held object latches.  In general, these latches protect individual data structure entries and share statistical data in a combined format elsewhere in promon.   Object latches are used to protect several different data structures and were added to improve concurrency in the database.  I had a presentation further describing this a while back – “A new spin on some old latches”.  I can post it if desired – it was delivered before Communities existed.

For example, each buffer in the buffer pool has its own latch since 10.1c.  Prior to that there were 4 latches that protected the entire buffer pool meaning that everytime 1 of those buffer latches was held, it would prevent access to 25% of the buffer pool.    The new mechanism with one latch per –B buffer is called an object latch and the object it protects in this case is the one buffer’s control structure (as opposed to a section of the buffer pool as was the case prior to that).  The latch is generally acquired in order to get a buffer lock of requested strength and is then released.

These object latches record their statistics under the guise of the regular latches (in our example here the BUF latches reported in promon and the VSTs).  Otherwise, a –B of 1,000,000 would have 1,000,000 statistics to gather and report as opposed to simply combining them for reporting purposes.

This restricted screen in promon does NO latching when reporting and none of the object latches should be acquired when  running promon with –NL.

(Sending the reply through email didn't seem to post so if a duplicate shows up, just ignore it!)

Posted by George Potemkin on 23-Jul-2015 09:56

> I had a presentation further describing this a while back – “A new spin on some old latches”.

Found it:
http://download.psdn.com/media/exch_audio/2008/AP/B10_Banville.ppt
Content created: 12/28/2007
Date last saved: 09/24/2008

Thanks!
George

This thread is closed