promon/Status: Buffer Lock Queue - Status: LOCKED, Type: FRE

Posted by George Potemkin on 16-Jan-2018 10:07

Curiosity is hungry again, sorry.

Free lock is an oxymoron, isn't it? What is a purpose of this type of buffer locks in Progress?

Progress uses the free locks for any block types, at least for data, index or sequence blocks.

'Usect' (how many processes are sharing a buffer) is always zero (for example, Usect is always 1 for EXCL locks) but FREE lock is a real lock:

01/16/18        Status: Buffer Lock Queue by user number for all tenants
Usr:Ten                  DBKEY Area T         Status           Type     Usect
158                        192    6 S         LOCKED           FREE         0
114                        192    6 S        WAITING       EXCLWAIT
146                        192    6 S        WAITING       EXCLWAIT

In 'seqprobe' test the number of FREE locks is 10-15% of EXCL locks, in other words, they're relatively rare. But in production sometimes we can see:

12/27/17        Status: Buffer Lock Queue
  User               DBKEY Area T         Status           Type     Usect
   -1                 1223   33 I         LOCKED           FREE         0
   -1                 2310   31 I         LOCKED           FREE         0
   -1                97880   33 I         LOCKED           FREE         0
   -1               144435   33 I         LOCKED          SHARE         0
   -1            107785728   12 D         LOCKED           FREE         0
   -1           1394877696    8 D         LOCKED            TIO         1
   -1            177276672    7 D         LOCKED            TIO         1

A bit too often for a rare event. Should we worry when we see FREE locks in production?

All locks reported by "Status: Buffer Lock Queue" are also reported by "Status: Buffer Locks" (the "S" column) - except FREE locks:
S Share Lock
I Share Lock with Intent to modify
X Exclusive Lock
R I/O state
B Online Backup Queue

All Replies

Posted by gus bjorklund on 18-Jan-2018 09:32

without reviewing the code, i can say the following:

0) when a buffer lock is released, not all the relevant fields are cleared. note that user -1 is an indication that no one is holding the buffer lock. so is use count 0.

1) these data are obtained by promon without any synchronization to minimize overhead. that allows for a variety of inconsistencies when the data are rapidly changing in a multiprocessor environment.

2) when i can look at the code i can tell you more.

Posted by George Potemkin on 18-Jan-2018 11:47

Thank you, Gus.

> note that user -1 is an indication that no one is holding the buffer lock. so is use count 0.

From "The Secret Life of Latches" presentation:

Session run:

REPEAT:
  FIND FIRST Customer NO-LOCK.
END.

I did: kill -SIGSTOP -> promon/ Buffer Lock Queue -> kill -SIGCONT

After a few tries (a ten or so) I caught the session with a buffer locked:

10/02/16  Status: Buffer Lock Queue by user number for all tenants
Usr:Ten  DBKEY Area T   Status       Type  Usect
 -1        384    8 D   LOCKED      SHARE      1

But the another menu in promon was still showing who is locking the buffer:

10/02/16  Status: Buffer Locks by user number for all tenants
    User:Ten  DBKEY Area     Hash T S Usect
 1     6        384    8      221 D S     1

The session was frozen so the data were not changing.
I know that information about buffer locks is stored in two different places:

1) In buffer headers (-B + -B2 + 2) scanned by "Status: Buffer Lock Queue";
2) In Usrctl table (-n + -Mn + 2) scanned by "Status: Buffer Locks".
Is it correct to say that the chains in the buffer headers is a real buffer lock "table" while data in Usrctl table is just a replica?

Posted by George Potemkin on 19-Jan-2018 07:16

One more question:
The messages 2523 and 5027 (it's two different messages, btw) say about the locked buffers in plural:
User <num> died with <num> buffers locked. (2523) or  (5027)
 
I guess an user can lock the buffers at the same time only for:
one data block
or
one data block plus object block (when area is extended)
or
a few index blocks that belong to the same index but no more than the number of levels in the index tree (plus object block)
or
a sequence block
 
So the number of slots in Usrctl table reserved for the locked buffers per user is 16 (the maximum number of index b-tree levels).
Is it correct?

Posted by George Potemkin on 24-Jan-2018 02:13

Buffer locks are "terra incognita" in Progress. That is why I think it would be a good thing to share our findings, assumptions and thoughts about how Progress uses the buffer locks.

My assumptions in the previous post about how many buffers can be locked by a session at once turned to be wrong. Tested using the atm db with each table and each index resided in their own areas and running 150 sessions. My dbmon script was collecting the "Status: Buffer Locks" and "Status: Buffer Lock Queue" menus in promon.

The average number of buffer locks per snapshot reported by "Status: Buffer Locks" was 161. The maximum 266. Minimum is 10. In other words, sometime almost all of 150 sessions were creating two buffer locks at the same time. And always it was one data block ("D") in disk I/O state (reported as "R" by "Status: Buffer Locks" and as "TIO" by "Status: Buffer Lock Queue") and the second one was an index block with SHARE lock. Data block always belongs to the "account" table. Index blocks sometimes belongs to the index of "account" table but sometimes (surprise!) to the index of "teller" table. 65% - index of "account" , 35% - index of "teller".

It can't be explained by inconsistencies of the data that are rapidly changing. Dbmon collects data from both menus and called them sequentially /almost/ at the same time. Both menus shown that the same dbkeys were locked by the same users (when "Buffer Lock Queue" menu reports the real number instead of "-1"). In other words, promon is fast enough to report the more or less consistent snapshots of the buffer locks. By the way, the seqprobe  test (an analogue of readprobe test) with CURRENT-VALUE() function is the best way to estimate how fast is promon's menus compared the buffer locks. The results, of course, depend from the -n and -B.

The second note: we should not trust to the value in the "T" column (block type) when a buffer lock is in disk I/O state ("R" or "TIO"). Progress does not yet know the type of block it's reading from disk into buffer pool. Promon reads a block type from buffer header left by block that previously used the same buffer and which will be now evicted by new block. In our tests the types of blocks were identified by the area numbers. I /assume/ it's a reliable data.

Posted by George Potemkin on 25-Jan-2018 02:56

Question: Which blocks can be locked by a session at the same time?
Answer (based on the results of two atm tests):
1) Exclusive lock of Object block + Exclusive lock of Data block
2) Data block in disk I/O state ("R" in the "S" column) + one or more Share/Exclusive locks of Index blocks. All index blocks belong to the same index. The index can belong to a table other than the table that owns the data block.
3) A few Index blocks that belong to the same index. The lock on the first block can have any status:
S Share Lock
I Share Lock with Intent to modify.
X Exclusive Lock
R I/O state (either being read or written)

The blocks are locked in the order as specified (or in the reverse order ;-).

Results of the tests are attached.

Objects per areas:

Area  Table/Index
 7    account
 8    branch
 9    client
10    config
11    history1
12    history2
13    history3
14    history4
15    results
16    teller
--    ---------------
17    account.account
18    branch.branch
19    client.id
20    config.id
21    history1.histid
22    history2.histid
23    history3.histid
24    history4.histid
25    results.id
26    teller.teller

Results:

[View:/cfs-file/__key/communityserver-discussions-components-files/18/atm4.2018.01.22.promon.Status_5F00_Buffer_5F00_Locks.MultiLocks.xlsx:320:240]

[View:/cfs-file/__key/communityserver-discussions-components-files/18/atm1.2018.01.23.promon.Status_5F00_Buffer_5F00_Locks.MultiLocks.xlsx:320:240]

Posted by George Potemkin on 26-Jan-2018 07:44

New questions: now about buffer locks and object blocks.

Sessions that create new records (for example, in the "history" areas) obviously need to update the area's object blocks that store information about the chains and area's current HWM. Promon/Status: Buffer Locks did show 39 exclusive locks on the object blocks of 60,932 buffer locks caught at the snapshots - only on the object blocks in the areas with 4 history tables and with their indexes. It's the rare events - only when areas are extended. So far so good.

But Promon/Status: Buffer Lock Queue show the queues to the object blocks: 10 processes in the average, 45 processes at the max are waiting for the same object block. Isn't it too much for the rare events? Do all of them try to extend an area? It's SHAREWAIT for the table's areas and EXCLWAIT for the index's areas. Why there is the difference? Why we did not catch at least one share lock on the object blocks but only SHAREWAITs?

Results:

[View:/cfs-file/__key/communityserver-discussions-components-files/18/atm4.2018.01.22.promon.Status_5F00_Buffer_5F00_Lock_5F00_Queue.ObjBlk.xlsx:320:240]

[View:/cfs-file/__key/communityserver-discussions-components-files/18/atm4.2018.01.22.promon.Status_5F00_Buffer_5F00_Locks.ObjBlk.xlsx:320:240]

This thread is closed