Upgrading to a all-flash NVMe storage array, RAID recommenda

Posted by c3rberus on 05-Dec-2018 21:20

We're upgrading our spinning disk storage array to an IBM StorWiz V7000 GEN3 that will be configured with 1.92TB 2.5In NVMe Flash Drives.

Traditionally I kept with Progress recommendations of RAID-10 on the storage array, best performance yield out of my 15K disks.

We're moving from a spinning disk array onto the NVMe SSD space, does the recommendation of RAID-10 on NVMe SSDs apply?

I ask because IBM is pushing DRAID-6 as the next best thing, this is their distributed RAID; so you won't have a dedicated spare disk instead you have blocks allocated within each disk that is part of the RAID group, so things like a RAID rebuild operation is much faster, etc.

I do like the features gained by going with IBM's recommendation of DRAID-6. Based on this Progress KB it sounds like RAID-10 is still the recommended option even if the underlying disks are SSD (and that article was updated in 2018).

Just wondering what RAID configuration you guys are deploying out there if the storage system is all flash NVMe array?

20x1.92TB NVMe SSD

  • RAID10 = 20x read and 10x write speed gain (at least 1-drive failure)
  • RAID6 = 18x read speed, no write speed gain (2-drive failure)

If I went RAID6 route, I could reduce the number of drives needed for the same amount of space, thus reducing cost... 

All Replies

Posted by ChUIMonster on 05-Dec-2018 21:37

Do you actually have a 40TB database?

I do not object to non RAID10 SSD.  The impact in an SSD environment is much more palatable.  But if the absolute best in performance is what you are after then IMHO your bigger concern is that this is a shared external device.  The storage is at the wrong end of a cable so performance will be less substantially impressive than you might imagine.  If you really want the best possible performance you want to eliminate that cable (and the switches and interfaces and adapters that go with it) and put the storage as close to the CPUs as you can.  And you want to avoid sharing it with other people's problems.

Your storage admins will hate that idea.  They might have to do extra work or something.  But if you want it to go as fast as possible internal SSD is what you want.  If other considerations are higher priority then you might not go that route.

Posted by c3rberus on 05-Dec-2018 21:43

The StorWiz will be directly connected via redundant 16GB FC links to an IBM Power9 system, not shared with anything else and no fc-switches along the way.

20x1.92TB in RAID10 only gives me 19TB roughly, split across production, development training, and testing is about 5TB per environment (4 duplicate env, only one environment is I/O heavy).

If I went "D-RAID6" I can greatly reduce the number of disks required to get to 19TB.

But I would then fall out of the RAID-10 recommendation Progress gives, but given it is NVMe SSD maybe RAID-6 is viable option.

Posted by ChUIMonster on 05-Dec-2018 21:52

"direct connect" <> "internal" but not being shared is a big plus.

"I/O heavy" is usually time dependent.  Most of the time a well tuned database is not going to be very I/O heavy because you will have leveraged RAM for the buffer pool etc to avoid IO ops.  IO issues tend to come up during unusual processing, periodic maintenance or in high-pressure recovery scenarios.  

Posted by cjbrandt on 06-Dec-2018 02:44

Do you have the option to run a test with D-RAID6 ?  Restore a database backup and then apply some large AI files.  Run an idxrebuild with multiple areas.  If the performance meets the requirements, then go with D-RAID6.  If more performance is needed then spend the $$ and get RAID10.

Run the test at 3am and create a batch file to send an email to your inbox every 3 minutes requesting an update on how much longer will it take...

Posted by c3rberus on 06-Dec-2018 15:39

That is the issue, I don't have a system to test D-RAID6. I am basically building the spec of the new server and IBM threw this DRAID6 into my head as the next best thing for SSD array, if it wasn't for the "distributed" part of it I wouldn't even be posting here and would go with RAID10 but the benefits of the distributed RAID got me thinking. I am leaning towards RAID10; yes it costs more but over the last decade it has not let me down on the rotary disks.

I was hoping someone here has done some flash only array deployments in RAID6 vs. RAID10 to see what comments they have from the field.

Posted by ChUIMonster on 06-Dec-2018 16:31

I have gone along with the deployment of several RAID5 SSD systems.  But not D-RAID6.  Having said that -- D-RAID6 sounds more robust than RAID5 to me.

Posted by gus bjorklund on 07-Dec-2018 15:36

RAID 5 can tolerate one drive failure.

RAID 6 can tolerate 2.

In either case, rebuilding a failed drive will require reading all the other drives which will cause performance degradation which can be quite severe.

RAID 6 should be able to do a drive rebuild better (with less impact) but i dont know since it is very vendor dependent and i’ve never tested.

Posted by Dmitri Levin on 02-Jan-2019 23:55

I believe most modern storage systems have a spare drive. So when a drive is bad it will be automatically replaced. So 1 or 2 drive failure is not much critical, to a certain extent. We have been experiencing rebuilding a filed drive often. And the only time it was noticed by crippled performance was when by accident the spare had an old scsi drive instead of SSD. The performance was terrible then. We had to pull scsi out, put ssd in and rebuild a second time. RAID 5 here.

I would also suggest to c3rberus  splitting Dev and Trainging from Production system, if possible.

This thread is closed