The -omsize value was set higher than the number of the objects in database (-omsize 2000 vs 1190 objects).
OM latch controls Object Manager cache: “This is the Object to Area mapping information used to show what Object resides in what Area. The Area also anchors the Index Delete Chain”.
Richard T. Banville wrote:
It's not just schema changes that can page something into the secondary cache. Any update where something is not found in the primary cache will cause a latch on the secondary cache. It's just that creating a new table forces it to be in the secondary cache.
More about OM latch:
community.progress.com/.../49024
Problem: after enabling CDC policies we got the high numbers of the OM latch locks and its naps: ~500,000 locks per sec, ~4,000-6,000 naps perc. OM latch was almost 100% busy (the most snapshots in promon shown a latch owner). In fact the OM latch locks were just 2-6 times less than BHT latch locks on any time intervals. In other words there is a correlation between the activities of these two latches. IMHO, it’s very strange - OM latch deals with db objects while BHT latch deals with the activity of db objects.
Any explanation?
BHT/OM latch lock ratio gives a hint of index tree levels.
No explanation, but a few questions:
Were there any objects in the secondary cache, according to promon R&D 1 13?
Was -omsize set to 2000 at DB start or increased online?
> after enabling CDC policies we got the high numbers of the OM latch locks and its naps
Would the online addition of CDC policies cause the creation of a large number of change tables, whose OM entries would be in the secondary cache (and result in latching) until DB restart?
My theory is that the changes for CDC either check the object information to see if CDC is required and it causes an om latch or that something (maybe cdc policies) are no pre-loaded into the primary om cache. Either way, if the om cache is sized properly and you are seeing om latch contention, then this it is a bug. I'll have someone file a bug for this so that additional investigation can be done on it.
Would the online addition of CDC policies cause the creation of a large number of change tables, whose OM entries would be in the secondary cache (and result in latching) until DB restart?
Yes.
> Yes.
Does that suggest this could be expected behaviour rather than a bug?
Yes, in that case it would be expected behavior. I did not read that George had added them online.
If that is the case, then they go into the secondary om cache and the secondary cache is always latched when accessed.
It does show the value of not latching the primary cache though.
This could be an area of improvement for us when adding new policies.
> Were there any objects in the secondary cache, according to promon R&D 1 13?
09/13/18 Status: Shared Resources 10:36:49 Active transactions: 481 Lock table entries in use: 35235 of 1500000 Lock table high water mark: 776635 Primary object cache (-omsize): Total: 2000 Used: 1176 Pinned: 0 Secondary object cache (-omsize): Total: 2000 Used: 164 Pinned: 0 Primary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0 Secondary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0
5 min later
09/13/18 Status: Shared Resources 10:42:15 Active transactions: 496 Lock table entries in use: 33006 of 1500000 Lock table high water mark: 776635 Primary object cache (-omsize): Total: 2000 Used: 1176 Pinned: 0 Secondary object cache (-omsize): Total: 5120 Used: 164 Pinned: 0 Primary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0 Secondary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0
> Was -omsize set to 2000 at DB start or increased online?
-omsize 2000 at startup and it was increased online to 5000
CDC policies were activated online. It caused OM latch naps. Deactivation of policies did not resolve the issue.
Quick test results:
1. If we just create a CDC policy for a table then the /reading/ its records will cause OM latch activity. But only if you don't restart a database.
2. The number of OM latch locks is increasing with the record reads. For example, 1 read -> 2 OM latch locks, 10 reads -> 600+ OM latch locks.
3. OM latch activity stays the same even if you delete a CDC policy. Unless you restart a database.
4. 'FOR EACH _Cdc-Table-Policy NO-LOCK' query caused the OM latch locks in production. Really a session running CDCAdminProcess.p crashed while holding the OM latch:
ABL 1507: (49) SYSTEM ERROR: Memory violation.
WDOG 1507: (5028) SYSTEM ERROR: Releasing regular latch. latchId: 3
But the tests show no OM latch locks when the query reads one record in _Cdc-Table-Policy.
1. That is expected behavior
2. What records are you reading? The user data or the CDC data? The ratio 10 reads : 600 OM latches seems wrong
3. Probably something that could be optimized on our end.
4. Should be reported and investigated by Progress.
> What records are you reading? The user data or the CDC data?
We read the history1 table in atm database (in other words, the user data).
Hi Richard,
Today I created case for this issue
00458901 - High OM latches and abnormal shutdown of the database after CDCBatch start
Thanks. That will help get the right resources to investigate the issue and determine if there is a bug here and if so how to fix it.
If I am understanding this properly, the reads of the existing history1 table (where all object references should be in primary om cache if set large enough at startup) are incurring om latch activity after the secondary om cache contains entries for the CDC change capture tables.
Am I properly interpreting what you are encountering?
Yes, you understood correctly.
I again checked this on the ATM database and here's the result.
1. I deleted all CDC policies and restarted the database.
2. After starting the database for the history1 table, I added one CDC policy, but did not activate it. I did this through the Pro2 Administration Utility.
3. Now in the promon menu R&D 1/13 Status: Shared Resources I see that I have 1 object in the Secondary object cache
09/14/18 Status: Shared Resources
16:59:34
Active transactions: 0
Lock table entries in use: 0 of 10016
Lock table high water mark: 10
Primary object cache (-omsize): Total: 1024 Used: 265 Pinned: 0
Secondary object cache (-omsize): Total: 1024 Used: 1 Pinned: 1
Primary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0
Secondary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0
Number of servers: 2 (41 allocated)
Total users: 2 (200 allocated)
Self-service: 0
Remote: 2
Batch: 0
Watchdog status: Executing
BIW status: Executing
Number of page writers: 1
Number of monitors: 2
Number of semaphores allocated: 249
Shared memory allocated: 852384K (1 segments. The last segment was not locked in memory)
Enter <return>, R, P, T, or X (? for help):
4. Now reset promon statistics in the promon menu R&D debghb 6/11 Activity: Latch Counts. And start this simple code:
MESSAGE "Zero Activity: Latch Counts" VIEW-AS ALERT-BOX. /*manuale reset promon statistics and click OK*/
DEF VAR i AS INT NO-UNDO.
i = 0.
FOR EACH atm.history1 NO-LOCK:
DISPL history1. PAUSE 0.
i = i + 1.
IF i > 10 THEN
DO:
LEAVE.
END.
END.
MESSAGE "Update Activity: Latch Counts" VIEW-AS ALERT-BOX.
5. Update promon statistics in Activity: Latch Counts
09/14/18 Activity: Latch Counts
17:00:55 09/14/18 16:59 to 09/14/18 17:00 (1 min 26 sec)
----- Locks ----- ------ Busy ------ Naps ---------- Spins ----------- ----- Nap Max -----
Owner Total /Sec /Sec Pct /Sec /Sec /Lock /Busy Total HWM
MTX -- 0 0 0 0.0 0 0 0 0 0 0
USR -- 6 0 0 0.0 0 0 0 0 0 0
OM -- 762 8 0 0.0 0 0 0 0 0 0
BIB -- 86 1 0 0.0 0 0 0 0 0 0
SCH -- 0 0 0 0.0 0 0 0 0 0 0
LKP -- 36 0 0 0.0 0 0 0 0 0 0
GST -- 0 0 0 0.0 0 0 0 0 0 0
TXT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
SEQ -- 0 0 0 0.0 0 0 0 0 0 0
AIB -- 0 0 0 0.0 0 0 0 0 0 0
TXQ -- 2 0 0 0.0 0 0 0 0 0 0
EC -- 0 0 0 0.0 0 0 0 0 0 0
LKF -- 54 0 0 0.0 0 0 0 0 0 0
BFP -- 0 0 0 0.0 0 0 0 0 0 0
BHT -- 1583 18 0 0.0 0 0 0 0 0 0
PWQ -- 0 0 0 0.0 0 0 0 0 0 0
CPQ -- 858 9 0 0.0 0 0 0 0 0 0
LRU -- 8 0 0 0.0 0 0 0 0 0 0
LRU -- 0 0 0 0.0 0 0 0 0 0 0
BUF -- 1954 22 0 0.0 0 0 0 0 0 0
BUF -- 434 5 0 0.0 0 0 0 0 0 0
BUF -- 348 4 0 0.0 0 0 0 0 0 0
BUF -- 430 5 0 0.0 0 0 0 0 0 0
INC -- 0 0 0 0.0 0 0 0 0 0 0
CDC -- 0 0 0 0.0 0 0 0 0 0 0
SEC -- 0 0 0 0.0 0 0 0 0 0 0
LG -- 0 0 0 0.0 0 0 0 0 0 0
Enter <return>, A, L, R, S, U, Z, P, T, or X (? for help):
As we can see, OM latch is now used very actively when only 10 records are read.
6. Now online I delete this CDC-policy via Pro2
09/14/18 Status: Shared Resources
17:03:27
Active transactions: 0
Lock table entries in use: 0 of 10016
Lock table high water mark: 10
Primary object cache (-omsize): Total: 1024 Used: 265 Pinned: 0
Secondary object cache (-omsize): Total: 1024 Used: 1 Pinned: 0
Primary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0
Secondary object cache is still in use.
7. I execute my code again and look into Activity: Latch Counts
09/14/18 Activity: Latch Counts
17:04:14 09/14/18 17:03 to 09/14/18 17:04 (14 sec)
----- Locks ----- ------ Busy ------ Naps ---------- Spins ----------- ----- Nap Max -----
Owner Total /Sec /Sec Pct /Sec /Sec /Lock /Busy Total HWM
MTX -- 0 0 0 0.0 0 0 0 0 0 0
USR -- 2 0 0 0.0 0 0 0 0 0 0
OM -- 762 54 0 0.0 0 0 0 0 0 0
BIB -- 14 1 0 0.0 0 0 0 0 0 0
SCH -- 0 0 0 0.0 0 0 0 0 0 0
LKP -- 8 0 0 0.0 0 0 0 0 0 0
GST -- 0 0 0 0.0 0 0 0 0 0 0
TXT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
SEQ -- 0 0 0 0.0 0 0 0 0 0 0
AIB -- 0 0 0 0.0 0 0 0 0 0 0
TXQ -- 2 0 0 0.0 0 0 0 0 0 0
EC -- 0 0 0 0.0 0 0 0 0 0 0
LKF -- 12 0 0 0.0 0 0 0 0 0 0
BFP -- 0 0 0 0.0 0 0 0 0 0 0
BHT -- 1569 112 0 0.0 0 0 0 0 0 0
PWQ -- 0 0 0 0.0 0 0 0 0 0 0
OM latch is still in use.
8. Again, create the CDC policy and restart the database.
09/14/18 Status: Shared Resources
17:07:37
Active transactions: 0
Lock table entries in use: 0 of 10016
Lock table high water mark: 0
Primary object cache (-omsize): Total: 1024 Used: 265 Pinned: 0
Secondary object cache (-omsize): Total: 1024 Used: 0 Pinned: 0
Primary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0
Secondary MT cache (-mtpmsize): Total: 1024 Used: 0 Pinned: 0
Now Secondary object cache is not used.
9. I execute my code again and look in Activity: Latch Counts
09/14/18 Activity: Latch Counts
17:09:35 09/14/18 17:09 to 09/14/18 17:09 (5 sec)
----- Locks ----- ------ Busy ------ Naps ---------- Spins ----------- ----- Nap Max -----
Owner Total /Sec /Sec Pct /Sec /Sec /Lock /Busy Total HWM
MTX -- 0 0 0 0.0 0 0 0 0 0 0
USR -- 0 0 0 0.0 0 0 0 0 0 0
OM -- 0 0 0 0.0 0 0 0 0 0 0
BIB -- 0 0 0 0.0 0 0 0 0 0 0
SCH -- 0 0 0 0.0 0 0 0 0 0 0
LKP -- 4 0 0 0.0 0 0 0 0 0 0
GST -- 0 0 0 0.0 0 0 0 0 0 0
TXT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
LKT -- 0 0 0 0.0 0 0 0 0 0 0
SEQ -- 0 0 0 0.0 0 0 0 0 0 0
AIB -- 0 0 0 0.0 0 0 0 0 0 0
TXQ -- 2 0 0 0.0 0 0 0 0 0 0
EC -- 0 0 0 0.0 0 0 0 0 0 0
LKF -- 6 1 0 0.0 0 0 0 0 0 0
BFP -- 0 0 0 0.0 0 0 0 0 0 0
BHT -- 1641 328 0 0.0 0 0 0 0 0 0
PWQ -- 0 0 0 0.0 0 0 0 0 0 0
CPQ -- 52 10 0 0.0 0 0 0 0 0 0
Now the OM latch is not used.
P.S.
Pay attention to BHT. Why is it so big, if we only read 10 records?
By the way, what are the "Pinned" objects in a object cache (and in MT cache)?
This is to provide isolation from uncommitted changes to object entries.
When making changes to an object the "old" committed version of the object is pinned in the cache while the new uncommitted version resides on disk as it is being changed.
This is how we ensure all other OLTP sees only the committed version of the object entry as opposed to the changed but not committed yet version. If the entry was not pinned in the cache, it could be paged out and when read on demand from disk NO-LOCK, the uncommitted version would show up in the cache.
After the transaction which changed the cache entry ends, committed or rolled back, the entry in the cache will be updated with the current on-disk version the next time it is referenced.
Thanks, Richard!
I see a fresh article about this
knowledgebase.progress.com/.../High-OM-latches-after-online-activation-of-CDC-policies
>>Cause
>>This is expected behavior.
But I do not agree that this should be the expected behavior. Performance degradation should not be due to the simple addition of CDC policies and this is certainly not expected by the end customers.
George, I'm re-reading your earlier post:
> If we just create a CDC policy for a table then the /reading/ its records will cause OM latch activity. But only if you don't restart a database.
I think I misunderstood what you said. I originally read this as "reads of the change table will cause OM latch activity". That would be expected behaviour.
But if you meant "reads of the application table for which the CDC policy was created will cause OM latch activity", then I wouldn't expect that, given that the OM cache was initially sized to have enough entries for all of the application objects.
Can you clarify please?
For example, 'FOR FIRST atm.history1 NO-LOCK: END' creates OM latch locks if CDC policy is enabled for history1.
Reading of _Cdc-Change-Tracking or _Cdc-Table-Policy does not create OM latch locks.