At what moment do bi blocks get preformatted?

Posted by davidkerkhofs3 on 28-Nov-2008 03:01

I'm not very sure on a matter with the BI file:

1. Suppose a BI file with a 1 Gigabyte fixed extent and a variable extent

2. Suppose an average BI activity of 512 Mb

3. Suppose a BI cluster size of 4 Mb

4. Suppose I need to minimize space for transfer of backup (so I truncate my BI)

Do I gain by adding more clusters than the initial 4 after I've truncated the BI?

In other words, what is the overhead in making a BI cluster?

My fixed extent is large enough to hold the bi notes, so preformatting the blocks in the cluster should be avoided. Or is defining a fixed extent not preformatting the blocks? Is it at cluster creation that the blocks get preformatted?

Perhaps, to expend the question: what is the precise difference in preformatting and formatting the blocks? And, are there differences in how the blocks for a bi cluster are formatted from how they are formatted in a Type2 cluster?

All Replies

Posted by ChUIMonster on 28-Nov-2008 09:47

Real life isn't usually a steady state. Usage of the bi file is subject to a lot of ups and downs in demand as well as certain application behaviors. In other words "average bi activity of 512 Mb" doesn't really say much.

Routinely truncating your bi file is generally considered to be a "worst practice". You should not be doing it. The amount of space on your backups that you are saving is probably pretty trivial and your users are going to incur the headache of having to wait for clusters to be formatted (unless you guess right and preformat enough cluster with "proutil bigrow").

When you create your database 4 clusters are pre-formatted. After that clusters get formatted either by way of "proutil bigrow" or as users need them because there are no free clusters available. Defining a fixed extent of a certain size does not imply that all clusters within that extent are pre-formatted. Only the 1st 4 clusters of the first bi extent get pre-formatted. So if your "steady state" bi size is 512MB, 16MB will be pre-formatted and 496MB (124 4Mb clusters) will need to be formatted on demand.

There are many reasons why a free cluster might not be available when needed. Chief among them are high activity levels and long lasting transactions.

High activity levels can be handled by using a large cluster size so that "spikes" in the activity are smoothed out. Personally I like to use a large enough cluster size so that during the busiest periods of use there are 10 to 15 minutes between checkpoints. This will result in your bi file reaching a "steady state" in terms of the number of bi clusters formatted and used.

More troublesome are situations where no cluster is available because of long lasting transactions. A bi cluster cannot be reused until all transactions that started in that cluster have completed. So if, for instance, a user enters a screen and opens a transaction and then goes to lunch without completing that work, the bi file will grow and grow and grow until the user returns and completes the transaction. The user doesn't have to just go to lunch -- they could also leave for a 3 week vacation

(Not directly related to bi file growth but another frequent effect of this sort of programming is a lot of record lock conflicts.)

Certain vendor applications are infamous for this behavior.

If you have your own application and can influence the coding standards this is why the rule of "no transactions that span user-interaction" is so important. You get this for free when you properly separate your application into UI, business logic and data-access layers.

Lastly there can also be situations where coders get confused about the difference between a "business transaction" and a "database transaction" and try to use the database to enforce inappropriate or impractical business rules. A major warning sign that you might have this sort of problem is having a need for very large lock tables (the -L parameter). As an example; it might be tempting to think that an archive and purge process should be "all or nothing" and that, therefore, it should be executed as a single large db transaction. Coding like this probably works very well in development and might even make a really cool demo. But it is a major disaster when deployed to a large customer who creates a database with terabytes of data and billions of records who tries to run the process 3 years down the road (there are lots of other domain-specific examples this one is easy to relate to everyone.) These processes use up huge amounts of bi space and vast numbers of record locks. There are always other ways to code them -- you usually need to think about restartability and "reversing transactions".

Posted by davidkerkhofs3 on 28-Nov-2008 16:40

So I may conclude that the bi blocks are formatted when a cluster is created?

So creating a fixed extent only prevents the OS from claiming the blocks, but that the block layout is determined later, in the case of BI when a cluster is created and in the case of type2 when records need to be written/updated?

Posted by ChUIMonster on 29-Nov-2008 07:30

That is essentially correct. If you dig around in presentations from past Exchanges you will find some PPTs from Rich and/or Gus that go into the really gory details of what happens internally.

This thread is closed