About the bi and recoverability

Posted by Jimmer on 27-May-2015 03:10

Hi,

Is there a situation where, with regard to recoverability/fault tolerance, there would be a gain in placing the bi on different physical disks than the data?

I know that if you want fault tolerance, you have to use ai/replication, so personally, I do not care where the bi is, since I guess if one data area is corrupted for some reason, having the bi "safe" won't necessarily help. AI on the other hand, when copied on a different system would, as many of you pointed out in previous different posts :)

So again with my question, what are the cases, if any, that would justify the placement of the bi files, from a recoverability perspective, on different disks than the data.

Thanks,

sorry for the trouble :)

All Replies

Posted by George Potemkin on 27-May-2015 03:41

It's not necessary to place bi file on a separate disk neither for fault tolerance nor for performance (in most cases).

Posted by Jimmer on 27-May-2015 07:04

With regard to fault tolerance:

"not necessary" or "irrelevant" ? :)

Because I kinda had a quick argument about it and I'm clearly missing any point as to the need of separating it from other areas, so for the sake of my sanity/self-esteem :))), are there any cases that benefit from a separation?

Posted by Paul Koufalis on 27-May-2015 07:10

With regard to fault tolerance there is no value in having the bi on a separate disk. With regard to performance, in a VERY high write environment, dedicated disk IO to the bi may improve performance.

Posted by TheMadDBA on 27-May-2015 09:59

No value at all for fault tolerance. If your BI file disappears then you are restoring from backups or you hate your data. Same thing if all you have is the BI file and the DB extents are gone.

In the olden days you could get actual performance increases by splitting out the BI, DB and AI files.

With modern RAID disks it is almost impossible to see a repeatable improvement unless you are talking about a very large and very write active DB like Paul says.

Posted by George Potemkin on 27-May-2015 10:25

I'd reword my statement. The bi file on a separate disk does improve performance but percentage of improvement is rather low except the extreme cases when bi writer can't provide the empty buffers in the -bibufs (I did not see such cases). If we are close to the limits of the disk throughput then we should look for more radical solutions to have the reserves for the load peaks. The customers with a very high write environment most likely use a million dollar disk arrays to store their databases. A separate disk array for bi file would be a too expensive solution. ;-)

In theory a separate disk for bi file would improve a crash recovery time for very large databases with the scattered updates but I don't have any data that would confirm this.

Posted by Jimmer on 28-May-2015 03:59

Thank you all/each for the replies and explanation. I feel much better/sane already :)

With regard to performance, our current general recommendation for our clients is a Raid 1 for the bi, and if the budget allows, 4 disks in Raid 10. For the data areas, we recommend at least 6 disks in Raid 10. So it should be fine.

Thanks again and sorry for the trouble.

Posted by Paul Koufalis on 28-May-2015 06:55

Raid 10 does not benefit the bi unless you have multiple databases as bi writes are single threaded.

Posted by Jimmer on 28-May-2015 07:12

Ok, I have to ask:

Isn't the Raid controller in charge of writing to the disk(s) in a transparent manner for the active thread, so whenever there is a write in a Raid 10 config, it is always spread between the disks (which makes it faster than Raid 1)?

JM

Posted by Paul Koufalis on 28-May-2015 07:47

Yes when aggregated across multiple writing threads. But when it's one thread writing sequentially, as in the case of bi writes, the physical location of the next block to write is adjacent to the previous block and the switch to the other disk(s) won't occur until the raid stripe is filled.

For example, if the stripe size is 1 Mg then the continuous sequential writes to the bi will write 1Mg to disk 0, then 1 Mg to disk 1, then disk 0...

Posted by ChUIMonster on 28-May-2015 07:50


[collapse]On 5/28/15 8:13 AM, Jimmer wrote:
Reply by Jimmer

Ok, I have to ask:

Isn't the Raid controller in charge of writing to the disk(s) in a transparent manner


Yes.  If you have a dedicated RAID controller.  You might also be doing RAID "in software" (it is always software at some level...) at the OS level or you might have a RAID implementation built into your SAN or NAS.

for the active thread,


No.  The RAID controller is many layers away from having any awareness of what particular thread is active.

so whenever there is a write in a Raid 10 config, it is always spread between the disks


More or less.

(which makes it faster than Raid 1)?


No.

On traditional disks (rotating rust) sequential IO operations are considerably faster than random IO ops.  Which is why disk manufacturers always quote those numbers in  large print and talk about "megabytes per second" rather than random IO seek times (if those are disclosed it is usually in very small print in a footnote somewhere...)

Suppose that your RAID 10 is 4 disks striped -- in that case it can do approximately 4x as many random IO ops as a single disk.  But *sequential* IO might be 10x faster.  And BI IO, in isolation, is primarily sequential.

*IF* you have a single database *and* very, very high transaction activity then keeping BI file on a dedicated disk can improve performance.  Especially if you are running a benchmark,  loading data or maybe doing an index rebuild.

But many (possibly even most) systems consist of more than one database and more than one bi file.  And few  systems actually have the level of activity needed to gain any advantage.  Dedicating a disk to each bi file is probably not feasible and unlikely to be very useful these days.


[/collapse]

Posted by TheMadDBA on 28-May-2015 08:29

You would most likely be better off just putting all of the disks into RAID 10.  The more disks you get involved in database IO the better.

If you are writing that often to the BI file you need to adjust your BI cluster/block sizes and probably look into the code a bit.

Posted by Jimmer on 29-May-2015 01:23

Does the same apply to bi reads, is it done also sequentially and doesn't benefit from Raid 10? Though I'm guessing that a bi read would imply the need to rollback a transaction, which normally should have a much lower occurrence than the bi write.

And Is the ai write also sequential?

Thanks

Posted by George Potemkin on 29-May-2015 03:49

> Does the same apply to bi reads, is it done also sequentially [snip]?

No in the most cases. Yes under the rare conditions.
Progress uses the jump notes to find the bi notes needed to be read to undo a given transaction ID.

http://knowledgebase.progress.com/articles/Article/000042774
When the first note in a bi block for a particular transaction being rolled back (UNDO) happens to be a purely physical note (which will be skipped - not undone) the jump information is ignored but this then causes a reverse sequential scan through the bi file looking for additional bi notes for this transaction.

> And Is the ai write also sequential?

Yes

Regards,
George

Posted by gus on 03-Jun-2015 15:42

disks and filesystems have a finite throughput capacity. with regards to performance, distributing the I/O workload over more drives is always a good idea. when you have before-image log on the same filesystem as the data extents (and/or any other stuff), they compete with each other for disk bandwidth. if you have enough for both, then fine. transaction rollbacks /usually/ are not a concern since they are relatively rare. or should be. like everything, there will be exceptions and there have been some improvements in recent times to increase concurrency when there are simultaneous rollbacks from different users. still, rollbacks are NOT sequential in the usual sense. they read the before-image log in reverse. if several transactions are being rolled back at the same time, the average is reverse-sequential but there will be competition amongst the users.

This thread is closed