Hi,
we are currently optimising our db in terms of storage areas. After reading through the documentation and especially all the information here in this forum, we were able to define the areas we need and to specify a RecordsPerBlock setting for every area. But I wasn't able to find out any best practice about setting the cluster size for an Area.
Are there any suggestions on this? Can the cluster size be calculated based on output from dbanalys? any "magic formulas"...? :-)
Thanks in advance for any help!
Mike
My advice for choosing the allocation cluster size value is:
Always use 8.
I say this because there is very little advantage to using a larger allocation cluster size. I have not seen any actual performance advantages for larger cluster sizes even though in theory there could be.
The disadvantage to larger cluster sizes is that they consume and waste more disk space than the smallest cluster size does. Each table or index will consume at least one cluster. So if you put 1,000 tables into an area with a cluster size of 512 and block size of 8k, the minimum disk space required will be 4 gigabytes (512 x 8k is 4 megabytes per object per cluster) even if the tables are all empty.
On average, the last cluster for a table or index will be half full and the remainder of the cluster will be unused space. For 1,000 tables with a cluster size of 512 and block size of 8k that will be 2 gigabytes of unused space, on average.
As for the maximum-rows-per-block setting, I see no advantage is choosing so-called "optimal" values. Before version 10.1B, rowids were 32-bit values and it made sense to care about the maximum rows per block value because it affects how many blocks can be in an area and therefore how many row fragments can be stored in an area.
Before 10.1B, the maximum possible number of row fragments in an area was limited to about 2 billion. If you used a maximum rows per block setting that was too small, then row data blocks would not be filled adequately. If you used a maximum rows per block setting that was too large, then you couldn't actually fit that many row fragments into a block and rowids would be wasted, which reduced the number of row fragments that could be stored before you ran out of rowids. For example, if you set the maximum tows per block to 256 but the row size was such that you only could fit 10 row fragments in a block, then 9/10 ths of the possible rowids would be unused. Thus you would be able to store only about 200 million row fragments in an area before you ran out of rowids. If the area contains indexes too, then each index block takes away 256 rowids, further reducing the number of possible rowids that could be used.
In 10.2B and later, with 64-bit rowids, none of that matters anymore. So just use 128.
-gus
So, bottom line, we have all these knobs to twirl that used to mean something, but now they don't.
Might one actually say that in the manual?
Well, perhaps so.
But:
- Not everyone agrees with me about this. I should have said that but
forgot.
- I am quite sure that there will be exceptional circumstances where one
might want to do otherwise. I have been surprised many times by various
systems and workloads behaving differently than what one might expect or
what /I/ have experienced.
- I expect that someone will eventually chime in and explain why I am wrong.
- "In theory, there is no difference between theory and practice, but in
practice, there is." - Jan L.A. van de Snepscheut
Having lots of knobs make life difficult because making the right choices
can be hard and requires work. Extra work should always be avoided.
Default values do not always give the right behaviour so having the knobs
also allows one to accommodate different circumstances /when necessary/. If
the defaults were always right, we wouldn't need the knobs.
Let’s say you have many users creating records for the same customer table. Your application creates a record and assigns a customer number to the record and fills in the customer name field. Later in the application or at some later point in the records life time it is updated to populate an address or description field of the record.
One might note that, in a well wrtiten ABL application this should be unlikely unless the secondary fill was a part of an entirely different process since the application *should* be writing all fields in a single tight transaction.
I absolutely agree. However, some tables contain "open ended" fields such as "description" or "comments" which are not normalized that will make existing records grow as the information becomes available and is populate into existing records.
Understood ... although, as you say, such things are often not properly normalized and it would be better DB design and application design if they were. But, I don't want to hijack the thread into a discussion of bad design practices!
That really depends on your UI style. If you're doing "screen at a time" submissions or something of that ilk then you would see lots of fields being updated all at once. You would also see that sort of behavior from many batch loads and electronic interface sorts of processes.
But if you are collecting data from a user in a manner that doesn't involve the use of an "ok" or "submit" button but which instead commits every field when it is entered (wysiwyg) then you are going to have lots of small single field commits.
To be sure, there are specialized cases where that is dictated by business rules, but in most cases one doesn't want to commit to the database until a coherent body of information is collected. Committing field by field makes sense for session state in temporary files dealing with an unreliable connections, but in general one wants to wait to commit these to the "real" database. Naturally, there are exceptions, but in general the assumption should be that this is bad programming practice unless there is a compelling business reason.
I disagree.
Batching stuff up and waiting for an "ok" or "submit" is unnatural and counter-intuitive. It is a legacy of the IBM 3270 terminal interface. If it weren't for the unfortunate fact that the paradigm was established in mainframe systems a long time ago and for reasons that have nothing to do with "a coherent body of information" hardly anyone would be doing it.
It has nothing to do with "programming practice" good or bad. If anything the "batch it up" approach is the bad practice because it spawned the "transactions that span UI" problem (at least in the Progress world) which one then works around with all sorts of ridiculous shenanigans.
In any event the Progress DB and 4GL are quite capable of dealing with either approach.
ridiculous shenanigans
???
Maybe a new thread needs to be opened?
At the risk of lowering the tone, it's a great name for a band!
Or the title of a conference talk