Buffer locked (or not locked?) on target database

Posted by George Potemkin on 28-Jun-2019 13:21

Target database is used by processes (appsevers) to provide a service. At some point of time we got a lot of blocked clients (‘Status: Blocked Clients’). Almost all connected processes were waited for the same dbkey – the root block of the most active index. The clients stayed blocked during 8 minutes. BUT ‘Status: Buffer Locks’ screen in promon checked many times during the same interval did not report any buffer locks.

06/25/19        Status: Blocked Clients by user number for all tenants
Sample Date     Time       Usr:Ten   Name      Domain     Type      Wait            Wait Info  Trans id   Num Txns   BI RRead  BI RWries  Login time     Schema Timestamp
  5671 06/25/19 19:52:59  5595       root          -4      REMC/APSV BKSH       27375479:8            0          0          0          0 06/25/19 19:41        1526776349   0
  5671 06/25/19 19:52:59  5597       root          -4      REMC/APSV BKSH       27375479:8            0          0          0          0 06/25/19 19:36        1526776349   0
...
  5712 06/25/19 19:59:55  5631       root          -4      REMC/APSV BKSH       27375479:8            0          0          0          0 06/25/19 19:29        1526776349   0
  5712 06/25/19 19:59:55  5632       root          -4      REMC/APSV BKSH       27375479:8            0          0          0          0 06/25/19 19:43        1526776349   0

Approximately at the same time replication agent started new transaction and there was no BI write activity (BI RWrites did not incremented) during the same interval:

06/25/19        Status: Active Transactions by user number for all tenants
Sample Date     Time       Usr:Ten   Name      Domain     Type       Login time     Tx start time  Trans id  BI RReads BI RWrites Trans State
  5669 06/25/19 19:52:39   513       root           0      RPLA      06/16/19 04:26 -              1124543754          0  343759567 None    FWD
  5670 06/25/19 19:52:49   513       root           0      RPLA      06/16/19 04:26 -              1124543757          0  343759581 None    FWD
  5671 06/25/19 19:52:59   513       root           0      RPLA      06/16/19 04:26 06/25/19 19:52 1124543758          0  343759755 Active  FWD
...
  5712 06/25/19 19:59:55   513       root           0      RPLA      06/16/19 04:26 06/25/19 19:52 1124543758          0  343759755 Active  FWD
  5713 06/25/19 20:00:05   513       root           0      RPLA      06/16/19 04:26 06/25/19 19:52 1124543758          0  343759881 Active  FWD

The questions:

1. When a process is waiting for buffer lock does it mean that the buffer is indeed locked rather than other database resources (for example, the lathes) are locked?

2. Can ‘Status: Buffer Locks’ screen miss some buffer locks? It scans Usrctrl Table rather than buffer pool (Buffer Headers) scanned, for example, by ‘Status: Buffer Lock Queue’.

Unfortunately, promon is broken since 11.7 (OCTA-13662) and we are forced to limit the volume of data gathered through promon - we are not yet totally blind but we do have the “vision problems”. For example, I don’t know if any latches were locked during this interval. The customer complained about significant degradation of the service response time.

All Replies

Posted by gus bjorklund on 28-Jun-2019 14:02

1. Yes.

2. It is possible that buffer states can change during the scan. For instance a buffer lock can be released and then acquired by a user that has already been examined. In such a case, the buffer lock would not show.

3. Same could happen when scanning the chain of buffer headers.

Posted by George Potemkin on 28-Jun-2019 14:28

Thanks, Gus.

Everything seems to point out that the processes hung for 8 minutes but we don't see the buffer lock that stopped the processes. How can it be explained?

Posted by Dapeng Wu on 28-Jun-2019 14:55

One suspect of this long pause is the replication agent, although I can't explain why Promon didn't show this.

If that is indeed the case, say for some reason, "buffer locks" for agent is missed by Promon, what could happen is there was an index split operation, and only part of the operation was replicated to the target while the clients were reading the index. These clients will have to wait till the whole index split is completed before they can get the root block. This is done to protect the reads because we don't want that happen in the middle of a split. In the meantime, for the index split operation on the source, even though it can finish pretty quick, the AI notes for this operation may take a while to fully replicate to the target (network latency, for example). And only after that, the index blocks will be released by the agent.

Again, this theory is based on the assumption that the agent was holding a X lock on the root block, which for some reason, we didn't see in this case.

Posted by gus bjorklund on 28-Jun-2019 16:12

I missed the fact that this was on the target. My explanation is only valid for changes that happen during a scan. The next scan would show the right status if a lock was held for a long time.

Dapeng, you are probably right, but there is still the question of why the buffer lock not visible for 8 minutes (and multiple scans). I can't explain that.

Posted by George Potemkin on 28-Jun-2019 16:15

Replication agent that use "the stealth technologies" can explain the case. Record's creates/deletes/updates are always zero on target database though a replication agent does create such kinds of activity.

By the way, what is Trans id that is reported for RPLA? It applies the notes that can belong to the different transactions on source database. But 'Status: Active Transactions' shows the same Trans id for long periods.

Posted by George Potemkin on 28-Jun-2019 16:47

I checked statistics for 15 different target databases: 'promon/Status: Buffer Locks' never reports any locks for RPLA.

Posted by George Potemkin on 30-Jun-2019 10:46

> By the way, what is Trans id that is reported for RPLA?

Empirical observations:

'Trans id' seems to be a Trid of the last processed note. In other words it's an id of transaction that was in Active state on source database. By the way, RL_TMSAVE notes are sent to a target db but they are ignored by replication agent. It's likely that these notes do not affect 'Trans id' on target db.

'Trans State' on target db is either Active or None (a.k.a. Dead). 'Status: Active Transactions' screen in promon seems always to contain one and only one row. The number of active transactions is always 1 ('Status: Shared Resources' and 'Activity: Summary' screens).

'BI Rwrites' reported for "transaction" is in fact the total number of recovery notes written by replication agent since its startup (so it's not per transaction).

I don't have enough data to say if 'Tx start time' is a transaction start time on source or on target db. Both cases would mean that target database is using Transaction Table but this conradictes to the fact that the number of active transactions is always 1.

Posted by gus bjorklund on 01-Jul-2019 14:56

While the counters dont get updates, a kind of buffer locks /are/ instantiated on the replication target. But they are different from the ones on the source because the target is open in single-user mode so simpler locking primitives can be used and there is no waiting.

New transaction identifiers are not generated during normal roll-forward processing and not during replication processing on the target(s). Instead, the orgional identifier that was generated at the source must be used.

Correct. transaction state is only active or none. Transaction state change from begin to active is not replicated. But two-phase active to ready-to-commit changes should be.

Tx start tinme should be time on source.

> On Jun 30, 2019, at 6:48 AM, George Potemkin wrote:

>

> Update from Progress Community

>

> George Potemkin

>

> > By the way, what is Trans id that is reported for RPLA?

>

> Empirical observations:

>

> 'Trans id' seems to be a Trid of the last processed note. In other words it's an id of transaction that was in Active state on source database. By the way, RL_TMSAVE notes do sent to a target db but they are ignored by replication agent. It's likely that these notes do not affect 'Trans id' on target db.

>

> 'Trans State' on target db is either Active or None (a.k.a. Dead). 'Status: Active Transactions' screen in promon seems always to contain one and only one row.

>

> 'BI Rwrites' reported for "transaction" is in fact the total number of recovery notes written by replication agent since its startup (so it's not per transaction).

>

> I don't have enough data to say if 'Tx start time' is a transaction start time on source or on target db.

>

>

> View online

>

>

> You received this notification because you subscribed to the forum. To unsubscribe from only this thread, go here.

>

> Flag this post as spam/abuse.

>

Posted by George Potemkin on 01-Jul-2019 17:26

> Correct. transaction state is only active or none. Transaction state change from begin to active is not replicated. But two-phase active to ready-to-commit changes should be.

I guess the "none" state has a bit different meaning on target database. On souce database the transactions stay in the "none" state for a very short period of time. If duration is not short then something very wrong is happening in the database. On target database we will see the "none" state until a replication agent starts to process the notes from new transaction. Long period of this state is a feature rather than a problem on target database.

PREPARING and PREPARED states should be short on source and target, should not them?

Posted by gus bjorklund on 01-Jul-2019 18:02

> On Jul 1, 2019, at 1:28 PM, George Potemkin wrote:

>

> PREPARING and PREPARED states should be short on source and target, should not them?

>

>

>

yes, unless a preparing note arrives and the next data transmission from the source does not happen right away.=

This thread is closed