Are there any caveats against the -lruskips 2147483647?

Posted by George Potemkin on 08-May-2016 08:57

I would like temporary (let's say just for 1 sec) to increase the -lruskips to its maximal value (2147483647). Are there any negative effects I might cause?

Maybe it would be better to use the value that should be a bit less than 2147483647 looks? The -lruskips cuts LRU locks by (-lruskips + 1) times. I guess Progress reads the number of the block accesses ("Usect"?) stored in the pool of buffer headers and takes the modulo (-lruskips + 1) value. If the result is zero than the block is moved on LRU chain. The 2147483648 would be too large for a signed 4-byte integer. But the tests did not revealed the problems with this -lruskips value.

BTW, kbase incorrectly states that maximum is 2147483648:

Article: Why is a maximum value of 2GB stated in the OpenEdge documentation for the LRU force skips (-lruskips) parameter?

http://knowledgebase.progress.com/articles/Article/Why-is-a-maximum-value-of-2GB-stated-in-the-OpenEdge-documentation-for-the-LRU-force-skips-lruskips-parameter


Thanks in advance,
George

All Replies

Posted by ChUIMonster on 08-May-2016 10:02

I'd be interested in hearing what happens :)

Posted by George Potemkin on 08-May-2016 10:23

And new tests did found some inconsistency. If I set the high value of the -lruskips online using promon and then return it back to zero the behaviour of Progress sessions will not be the same as just after db startup with -lruskips 0. I used promon itself to access the blocks. Any action in promon ("U"pdate, "S"ample, "Z"ero and even "R"epeat?!) reads the ACO object blocks - one block per data area including "Control Area". And, by the way, the queries of any VSTs do the same. It's VERY unfortunate that these blocks are on the LRU chain. So in sports db promon's actions will read 7 blocks (and will create 7 BHT and 7 LRU latch locks). We can create db accesses and check their latch locks without leaving the "Activity: Latch Counts" screen. After the changing the -lruskips forth and back these actions will not lock LRU latch anymore. Really it's excellent news! But I would like to understand if it's a real thing or just a "mirage". ;-)

Posted by ChUIMonster on 08-May-2016 10:34

What is "ACO"?

Posted by George Potemkin on 08-May-2016 10:41

"Area Control Object" object block = bk_type 12 + objectId 0, block 3 in each area:

OBJBLK:
0040 totalBlocksOld:               0x%016x %d
     hiWaterBlockOld:              0x%016x %d
     chainFirst[FREECHN]:          0x%016I64x %I64u
0050 chainFirst[RMCHN]:            0x%016I64x %I64u
     chainFirst[LOCKCHN]:          0x%016I64x %I64u
0060 numBlocksOnChainOld[FREECHN]: 0x%016x %d
     numBlocksOnChainOld[RMCHN]:   0x%016x %d
     numBlocksOnChainOld[LOCKCHN]: 0x%016x %d
0070 chainLast[FREECHN]:           0x%016I64x %I64u
     chainLast[RMCHN]:             0x%016I64x %I64u
0080 chainLast[LOCKCHN]:           0x%016I64x %I64u
     objectId:                     0x%04hx             %d
     objectType:                   0x%04hx             %d
0090 serialNumber:                 0x%016I64x %I64u
     firstFreeCluster:             0x%016I64x %I64u
00A0 lastFreeCluster:              0x%016I64x %I64u
     totalBlocks:                  0x%016I64x %I64u
00B0 hiWaterBlock:                 0x%016I64x %I64u
     numBlocksOnChain[FREECHN]:    0x%016I64x %I64u
00C0 numBlocksOnChain[RMCHN]:      0x%016I64x %I64u
     numBlocksOnChain[LOCKCHN]:    0x%016I64x %I64u
00D0 partitionId:                  0x%04hx             %d

Posted by ChUIMonster on 08-May-2016 10:53

Got it.  I couldn't figure out the abbreviation -- but that makes sense.

Posted by George Potemkin on 08-May-2016 13:24

The explanation found. New value of the -lruskips will be used after a block will be accessed N times where N is the previous value of the -lruskips. So if we will increase the -lruskips to 2 billions and then will change it back to 0 it will  take a lot of time ("eternity") before previous value will "expire". The blocks that were not accessed while the -lruskips was set to 2 billions will use the current -lruskips value immidiately. Excellent! It's what I need. We can start db with -lruskips 2147483647 and then immidiately change it to the "working" value. The -lruskips 2147483647 will stay in use for the blocks accessed at db startup including the ACO blocks. Or increase the -lruskips at any time, take Z, U, L or R action in promon' screen and return the previous value of the -lruskips. It can be done almost instantly. The ACO blocks (and most likely only these blocks) will be "infected" by the maximum -lruskips value.

Tom, the trick can be useful for the scrips (like my dbmon) that use promon to gather db statistics as well as for 4GL programs that use VSTs (like ProTop). It makes them insensitive to the contention on LRU latch. I saw promon hung during the seconds when LRU latch was a bottleneck and in such cases a sampling interval missed the period with highest activity. I'm sure the same is true for VSTs.

Posted by George Potemkin on 09-May-2016 00:04

The tests were re-done with a fresh mind. The trick is a bit harder than I thought yesterday: the -lruskips sets new "countdown" value in the buffer headers only when a buffer is accessed (obviously) and only when its current "countdown" counter is zero (what I missed yesterday).

proserve sports -lruskips 0
promon sports

Step 1: promon reads ACO blocks at its startup:

promon/R&D/debghb/6/1. Cache Entries

05/09/16        Status: Cache Entries  

  Num   DBKEY Area   Hash T S Usect Flags   Updctr   Lsn Chkpnt  Lru   Skips
                                                                      
   33      64    1    139 O       0 L            4     0      0    0       0
   34      64    6    824 O       0 L         1385     0      0    0       0
   35      64    7     74 O       0 L           43     0      0    0       0
   36      64    8    211 O       0 L           10     0      0    0       0
   37       2    9    348 O       0 L            6     0      0    0       0
   38       2   10    485 O       0 L            6     0      0    0       0
   39      64   11    622 O       0 L            6     0      0    0       0

Step 2: Increase the -lruskips and access the ACO blocks.

4. Administrative Functions ...
4. Adjust Latch Options
8. Adjust LRU force skips: 100

U - Update activity counters (any "Activity" screen) in promon or read any VST table.

Look at the "Skips" coulmn.

  Num   DBKEY Area   Hash T S Usect Flags   Updctr   Lsn Chkpnt  Lru   Skips
   33      64    1    139 O       0 L            4     0      0    0     100
   34      64    6    824 O       0 L         1385     0      0    0     100
   35      64    7     74 O       0 L           43     0      0    0     100
   36      64    8    211 O       0 L           10     0      0    0     100
   37       2    9    348 O       0 L            6     0      0    0     100
   38       2   10    485 O       0 L            6     0      0    0     100
   39      64   11    622 O       0 L            6     0      0    0     100

Step 3: Access the ACO blocks again.

U - Update activity counters

  Num   DBKEY Area   Hash T S Usect Flags   Updctr   Lsn Chkpnt  Lru   Skips
   33      64    1    139 O       0 L            4     0      0    0      99
   34      64    6    824 O       0 L         1385     0      0    0      99
   35      64    7     74 O       0 L           43     0      0    0      99
   36      64    8    211 O       0 L           10     0      0    0      99
   37       2    9    348 O       0 L            6     0      0    0      99
   38       2   10    485 O       0 L            6     0      0    0      99
   39      64   11    622 O       0 L            6     0      0    0      99

Step 4: Increase the -lruskips to its maximum value and access the ACO blocks.

8. Adjust LRU force skips: 2147483647
U - Update activity counters

  Num   DBKEY Area   Hash T S Usect Flags   Updctr   Lsn Chkpnt  Lru   Skips
   33      64    1    139 O       0 L            4     0      0    0      98
   34      64    6    824 O       0 L         1385     0      0    0      98
   35      64    7     74 O       0 L           43     0      0    0      98
   36      64    8    211 O       0 L           10     0      0    0      98
   37       2    9    348 O       0 L            6     0      0    0      98
   38       2   10    485 O       0 L            6     0      0    0      98
   39      64   11    622 O       0 L            6     0      0    0      98

Step 5: Access the ACO blocks another 98 times

UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU... ;-)

  Num   DBKEY Area   Hash T S Usect Flags   Updctr   Lsn Chkpnt  Lru   Skips
   33      64    1    139 O       0 L            4     0      0    0       0
   34      64    6    824 O       0 L         1385     0      0    0       0
   35      64    7     74 O       0 L           43     0      0    0       0
   36      64    8    211 O       0 L           10     0      0    0       0
   37       2    9    348 O       0 L            6     0      0    0       0
   38       2   10    485 O       0 L            6     0      0    0       0
   39      64   11    622 O       0 L            6     0      0    0       0

Step 6: Access the ACO blocks just one more time:

  Num   DBKEY Area   Hash T S Usect Flags   Updctr   Lsn Chkpnt  Lru      Skips
   33      64    1    139 O       0 L            4     0      0    0 2147483647
   34      64    6    824 O       0 L         1385     0      0    0 2147483647
   35      64    7     74 O       0 L           43     0      0    0 2147483647
   36      64    8    211 O       0 L           10     0      0    0 2147483647
   37       2    9    348 O       0 L            6     0      0    0 2147483647
   38       2   10    485 O       0 L            6     0      0    0 2147483647
   39      64   11    622 O       0 L            6     0      0    0 2147483647

Posted by George Potemkin on 09-May-2016 05:51

Yesterday I was wrong about VSTs: they do not read the ACO blocks like promon does. Sorry for misleading.
 
Why the the ACO blocks should not be on LRU chain:
For example, if you're planning, to kill an "annoying" self-service session then it's recommended to stop the session first (kill -SIGSTOP), to check if the session is holding any regular latches and to use the kill -9 hoping that the session did not hold a multiplexed latch. If the session is actively reading the data from database buffer pool (like readalot.p does in readprobe test) and if a database was started with -lruskips 0 than the chances to stop a self-service process while it holds LRU latch is approximately 3-10%. If it will really happen then a database will hang: nobody will be able to connect it. If you had the promon session that was already connected the database you will be unable to update any its screens: promon will try to read the ACO blocks and will wait for LRU latch. The "Activity: Latch Counts" screen in promon should not show an owner of LRU latch even though it's a regular latch. In fact it sometimes happens but only because a process got the latch while promon has already started reading shared memory with latch information. It's just 4K area (32*128 bytes) and promon reads it instantly but not fast enough compared with the latch operations. If a process is holding the LRU latch persistently then you will even unable to initiate an emergency shutdown. And there will be no footprints in the logs that could explain what is going on in your database. It's an ideal situation for the malicious minds. ;-)
 
The solution is "flu shot" done when a database is yet working fine or at least before you're going to use SIGSTOP. Here is the script that does the trick:
#------------------------------------------------------------------------------

LruShot()
{
# "Flu Shot": make ACO ("Area Control Object") Object Blocks insensitive to
# the contention on LRU latch. Promon will work even if LRU latch is locked.
# Script sets "Skips" value in Cache Entries (promon/R&D/debghb/6/1)
# to 2147483647 (a maximum value of the -lruskips). Access to these blocks
# will not acquire the LRU latch the next 2 billions times.
#
  Db=$1

  MaxSkips=2147483647
  MinSkips=2000

# Do nothing if the current -lruskips is higher than MinSkips.
# Otherwise set it to MaxSkips for a short period of time.
# The higher the current lruskips the longer the script will work:
# Approximately 1 sec per 1000 skips.

  PROSHUT=${PROSHUT-$DLC/bin/_mprshut}

# Get the current value of the -lruskips:
  LruSkips=`
   (echo "R&D"     # Advanced options
    echo "4"       # 4. Administrative Functions ...
    echo "4"       # 4. Adjust Latch Options
                   # 4. Adjust LRU force skips: 0
   ) | \
    $PROSHUT $Db -0 -NL 2>/dev/null | tr -d "\f" | \
    awk '/Adjust LRU force skips:/ {print $NF}'
  ` # LruSkips

  echo The current lruskips: $LruSkips

  test $LruSkips -le $MinSkips && \
  echo Reading ACO blocks in loop... && \
  time \
 (
# Set the -lruskips to MaxSkips:
  echo "R&D"       # Advanced options
  echo "4"         # 4. Administrative Functions ...
  echo "4"         # 4. Adjust Latch Options
  echo "4"         # 4. Adjust LRU force skips:
  echo "$MaxSkips" # Enter new LRU force skips value

# Read ACO blocks = Update activity counters:
  echo "T"         # Return to the top level (main) menu.
  echo "2"         # 2. Activity Displays ...
  echo "9"         # 9. I/O Operations by File

  MinSkips=$LruSkips
  while [ $MinSkips -ge 0 ]
  do
    MinSkips=`expr $MinSkips - 1`
    echo "U"       # Update activity counters.
  done

# Reset the -lruskips to its initial value:
  echo "T"         # Return to the top level (main) menu.
  echo "4"         # 4. Administrative Functions ...
  echo "4"         # 4. Adjust Latch Options
  echo "4"         # 4. Adjust LRU force skips:
  echo $LruSkips   # Enter new LRU force skips value
  echo "X"         # Exit from the OpenEdge Monitor utility.
 ) | \
 $PROSHUT $Db -0 -NL 2>/dev/null 1>&2

} # LruShot

#------------------------------------------------------------------------------

LruShot sports

This thread is closed