BI Threshold Sizing

Posted by Joey Samoy on 13-May-2019 14:49

Hi,

For a 200GB db, we have 2 fixed BI extents at 1GB each. Threshold is set at 2.4GB.

Couple of times, we're received the threshold warning, eventually crashing the db down. bistall is not enabled, the bi grew fast that I didn't have time to dbquiet it. to at least know also who/what the user was doing.

The parameters have been there for years, we'll most likely hit this scenario again. The regular reaction have been to (gracefully) shutdown db, and truncate bi :-(. 

With the BI variable extent, I thought that should have handled the bi growing. So I'm thinking of increasing the threshold, but at what value? 

TIA!

on OE 10.2B Enterprise

Posted by Rob Fitzpatrick on 13-May-2019 20:50

> What would you do after threshold is increased? So transactions completes, no crash, use that value in the db start up param? So if same user scenario happens, you don't get the same warning/error?

If I understand you correctly, you are concerned that if you increase the -bithold enough that this activity doesn't shut down the database then that will just mask the problem?

The problem isn't the database shutdown (though of course it is problematic).  The problem is the combination of user behaviour and application code that causes BI growth, and the shutdown is a symptom of that problem.  If you monitor long-running transactions and alert when they happen, you won't miss these events and you should be able to retrieve some actionable information that tells you what to do next.  I don't know if this is a vendor application or in-house, nor what is involved for you in making code changes.  Maybe you can train users in what not to do, in the short term, until the code fix is available.

> The BI would have reached a high water mark as well, would you change the structure? Add more extents?

You could look at adding more BI extents.  That could reduce the impact of BI growth, as your application freezes temporarily when a new BI cluster is formatted and inserted into the cluster ring.  Having that happen within the fixed extents should be less time-consuming than also having to get more file space from the OS to grow the variable extent.  But note that there is an impact to having a large BI file.  For example it will make your online backups larger.  

Ultimately you need to fix the root cause to prevent the BI growth.  Once you do that you can take a brief outage and truncate your BI file and then grow it back to whatever was its typical size before this issue hit you.

All Replies

Posted by Rob Fitzpatrick on 13-May-2019 15:36

First, with an Enterprise license you should have large file support enabled; I'll assume that's true.  Otherwise the variable extent can't grow beyond 2 GB.

> With the BI variable extent, I thought that should have handled the bi growing. So I'm thinking of increasing the threshold, but at what value?

The variable extent will allow the BI area to grow, but not much in your case.  You have two 1 GB fixed extents and a 2.4 GB BI threshold.  Therefore the variable can only grow to 400 MB before forward processing stops.  Because you are not using -bistall, that stop means a database shutdown.  If you did specify -bistall, that would give you an opportunity to increase the threshold online at the time when you stall, so you can collect valuable information in promon about what is happening.  If you shut down, that opportunity is lost.

> The parameters have been there for years, we'll most likely hit this scenario again.

Does this type of shutdown happen often?  If you want a different result in this situation, you should change your BI configuration.  I suggest you increase your -bithold so your variable extent can grow larger.  The value that is appropriate for you depends on your available space.  Ensure that the BI file system has plenty of free space and is not likely to be filled with other data (e.g. other DBs' BI extents, application or DB logs, client temp files, etc.).  Don't use more than a quarter of the available space.

It is important to monitor not just the BI usage and size but also what is happening with clients.  BI growth will be cause by a combination of (a) one or more users with long-running transactions in ACTIVE status and (b) all users' transaction activity, i.e. the amount of BI notes that are written while those long-running transaction are active.  Use promon (R&D 1, 4, 3) or ProTop and closely monitor long-running transactions.  Ideally database transactions should last a few seconds or less, so if you see them get into the range of five to ten minutes then you should be prepared to take action.  Likely the root cause of the BI growth is an application issue.

You can use proGetStack on the client side or Client Database Request Statement Caching (in promon or ProTop) on the server side to determine what code the client in the long transaction is running.  Alternatively, if it's a human user you can just ask them or look over their shoulder. :)  Once you determine what the offending code is, have the user exit out of what they are doing.  If that is not possible, you will want to disconnect the user.  That's not a problem if the user is remote, but be aware that there is a chance that disconnecting a shared-memory 4GL client could cause a database shutdown if that user dies while holding a latch.

Posted by Joey Samoy on 13-May-2019 16:35

Hi Rob,

It doesn't happen often. The noticeable scenario though is it happened at noon. It looks like people run something that starts a transactions then walks away for lunch.

What would you do after threshold is increased? So transactions completes, no crash, use that value in the db start up param? So if same user scenario happens, you don't get the same warning/error?

The BI would have reached a high water mark as well, would you change the structure? Add more extents?

Thanks!

Posted by Rob Fitzpatrick on 13-May-2019 20:50

> What would you do after threshold is increased? So transactions completes, no crash, use that value in the db start up param? So if same user scenario happens, you don't get the same warning/error?

If I understand you correctly, you are concerned that if you increase the -bithold enough that this activity doesn't shut down the database then that will just mask the problem?

The problem isn't the database shutdown (though of course it is problematic).  The problem is the combination of user behaviour and application code that causes BI growth, and the shutdown is a symptom of that problem.  If you monitor long-running transactions and alert when they happen, you won't miss these events and you should be able to retrieve some actionable information that tells you what to do next.  I don't know if this is a vendor application or in-house, nor what is involved for you in making code changes.  Maybe you can train users in what not to do, in the short term, until the code fix is available.

> The BI would have reached a high water mark as well, would you change the structure? Add more extents?

You could look at adding more BI extents.  That could reduce the impact of BI growth, as your application freezes temporarily when a new BI cluster is formatted and inserted into the cluster ring.  Having that happen within the fixed extents should be less time-consuming than also having to get more file space from the OS to grow the variable extent.  But note that there is an impact to having a large BI file.  For example it will make your online backups larger.  

Ultimately you need to fix the root cause to prevent the BI growth.  Once you do that you can take a brief outage and truncate your BI file and then grow it back to whatever was its typical size before this issue hit you.

Posted by Joey Samoy on 13-May-2019 21:23

Thank you for your insights Rob.

This is a 3rd party app, but we have many in house extensions. Would be interesting to see which one causes the bi to grow.

Thanks again!

Posted by Rob Fitzpatrick on 13-May-2019 21:33

Happy to help.  Here is some more info on client statement caching from the Progress Revolution 2011 conference:

"What's That User Doing? The Progress OpenEdge Client Request Statement Cache"

http://download.psdn.com/media/revolution_2011/oe11/OE1111.wmv

Posted by gus bjorklund on 13-May-2019 22:32

> On May 13, 2019, at 12:36 PM, Joey Samoy wrote:

>

> What would you do after threshold is increased? So transactions completes, no crash, use that value in the db start up param? So if same user scenario happens, you don't get the same warning/error?

There is zero correlation between database size and before-image log volume. the rate of before-image log data that is generated is strictly a function of the rate of database changes that are made by the application. the before-image log size is affected by that and by transaction duration (which affects log space reuse).

Set the threshold at maybe 6 GB and if you hit that, raise it by 25%.

Make sure you have large files enabled and use one variable bi extent and one fixed that are fixed at 4 GB or use a single variable extent.

Posted by frank.meulblok on 14-May-2019 10:06

[quote user="Rob Fitzpatrick"]

Happy to help.  Here is some more info on client statement caching from the Progress Revolution 2011 conference:

"What's That User Doing? The Progress OpenEdge Client Request Statement Cache"

http://download.psdn.com/media/revolution_2011/oe11/OE1111.wmv

[/quote]
Statement caching isn't a great tool to track transaction scoping issues. You'll only get the point of the last database statement at the time you query it, which may or may not have anything to do with the statement that initiated or activated the transaction. Consider this fragment:
DO TRANSACTION:
    FIND FIRST customer EXCLUSIVE-LOCK.
    customer.comments = customer.comments + " blah blah blah".
    PAUSE 15.
    FOR EACH salesrep NO-LOCK:
    END.
    PAUSE MESSAGE "check statement cache".
END.


If you check the cache when you're prompted to, you get the reference to the "FOR EACH salesrep".
Also consider switching on client logging with the 4GLTrans entry type active.
Keep logging level at 2 (basic), that will log only start and end of transactions, not substransactions. Keeps the logs as small as possible while still giving you what you need.
If you spot a client with a long-running transaction, check their log for the last "BEGIN TRANS [code reference]" entry and tell the dev. team to start investigating from there.

Posted by Rob Fitzpatrick on 14-May-2019 11:55

Good advice about 4GLTrans.  

Though I think CSC has some merit too, and the advantage that it can be enabled server-side for clients that are already running.  That means that if you don't yet know which user is guilty, you may not have to wait for the problem to happen again in order to track it down.  If the client communicates with the database again, that activity will show up in the cache.  Even if it isn't at the exact line that is the problem, the program name is a starting point for the programmer to investigate with static analysis or a compile listing.  

Anyway Joey... follow Frank's advice.  :)

Posted by ChUIMonster on 14-May-2019 12:24

Client logging with "4gl Trans" enabled would be great but it is very rare to find that it has been enabled for all clients in production.   As Rob says CSC isn't perfect but it does often at least give you an idea where to start looking and you can enable it from the server side without the user needing to do anything.

I might has missed it but proGetStack is also a useful tool if you have access to the command line on the machine that the client connects from.

At a minimum just knowing the name of the guilty user is frequently a pretty good clue -- many times I have pointed out a long running transaction to an admin at a customer site and they immediately know what that user is running.  And they often know what was changed that is probably causing the problem.

Random thought:  it would be really cool if the program name and statement line# that start a TRX was recorded automatically and available in _Trans ;)

Posted by George Potemkin on 14-May-2019 12:50

AI scan can explain why bi file was growing. And sometimes AI scan can describe what  a guilty transaction was doing and it might be enough to identify a code.

Posted by Patrice Perrot on 20-May-2019 08:14

Hi,

If you are using AppServer , you can have a look at TRANS-INIT-PROCEDURE.

documentation.progress.com/.../index.html

This thread is closed