NetApp

Posted by ctoman on 24-Jan-2018 11:01

  OE 11.4

  OS HP-UX

  STORAGE 95K

  USER COUNT 1500 concurrent

   We are in the very beginning of the process to upgrade our hardware,

   our storage solution.  Our storage and SA guy/gal say that NetAPP AFF 300 all flash array is the way to go.  ?

All Replies

Posted by James Palmer on 24-Jan-2018 11:33

This thread is probably worth a read: community.progress.com/.../30063

Posted by ChUIMonster on 24-Jan-2018 12:26

From a db performance point of view putting flash in a shared storage device is putting it in the least useful place that it can go.

That's a great way to drain you wallet and get poor performance while financing a new yacht for the Netapp sales team.

Just say no.

If you want your IO subsystem to be fast:

 1) Do not put it at the other end of a cable.

 2) Do not share it with other applications.

Instead spend a small fraction of the same money on internal SSD and get great performance.  If your storage & admin team complains that their life is somehow made more difficult take a portion of your savings and hire some new storage & admin people.  You will still come out ahead.

Database IO is random IO.  Somewhat perversely, the better tuned your database is, the more random your IO becomes.  The latency of each and every IO operation is your enemy.  In a SAN (or NAS) the main contributor to latency is the cable along with the various adapter cards along the way.  Not the device holding the data way out at the wrong end of that cable.

"Oh but it's a zigabit per second cable!" shows that whoever says such a thing has totally missed the point.  Zigabits per second is a useful metric for sequentially streaming data -- it says *nothing* about the latency of random access.

From a database centric performance perspective internal SSD is:

   1) Faster.  *Much* faster.  Literally 100x faster.

   2) Cheaper.  *Much* cheaper.  You can buy enough SSD for most databases for the price of a good steak dinner.

It really isn't even close.  But people will still insist on poking themselves in the eye with sharp sticks rather than doing the sensible thing.  Because, after all, if it is the right solution for file sharing then it must  be even better for a database application.

Posted by gus bjorklund on 25-Jan-2018 09:40

> On Jan 24, 2018, at 12:02 PM, ctoman wrote:

>

> Update from Progress Community

>

> ctoman

>

> OE 11.4

>

> OS HP-UX

>

> STORAGE 95K

>

> USER COUNT 1500 concurrent

>

> We are in the very beginning of the process to upgrade our hardware,

>

> our storage solution. Our storage and SA say that NetAPP AFF 300 all flash array is the way to go. What do others opinion?

>

when you are looking for storage for a database server, listen to what database people have to say, NOT to know-nothing storage vendor sales droids.

everything tom bascom said is correct.

Posted by Thomas Mercer-Hursh on 28-Jan-2018 10:30

Do any of you folks have any real world experience with the products from Pure Storage.  I am seeing claims of <1ms latency and yet this seems like another flash drive at the end of a cable.

Posted by ChUIMonster on 29-Jan-2018 07:37

When dealing with storage vendors you are always safe to assume snake oil.

One of the favorite deceptions of storage vendors when addressing latency is to use numbers from their *internal* monitoring tools.  They report the latency from the disk to the controller inside their cabinet.  Conveniently ignoring all the necessary bits of a real workload that make their product look bad.

Posted by Thomas Mercer-Hursh on 29-Jan-2018 09:12

Seems like that would have to be the case ... but I am not finding citations.

Posted by ChUIMonster on 29-Jan-2018 09:34

They aren't going to make a point out of a public confession...

Instead of trusting the vendor look for independent end to end benchmarks.  If you can find something there that substantiates the claim then *that* would have weight.  But be careful -- not everything that claims to be independent really is.

Posted by gus bjorklund on 29-Jan-2018 11:22

here are some SPC-1 benchmark results.

it will take some effort to understand what is being tested. still, better than nothing at all.

www.storageperformance.org/.../

Posted by gus bjorklund on 29-Jan-2018 11:26

> On Jan 29, 2018, at 12:22 PM, gus bjorklund wrote:

>

> here are some SPC-1 benchmark results.

>

> it will take some effort to understand what is being tested. still, better than nothing at all.

>

> www.storageperformance.org/.../

>

>

forgot to mention: “there are three kinds of lies, damned lies, and benchmarks.”.

(paraphrasing benjamin disraeli)

Posted by Thomas Mercer-Hursh on 29-Jan-2018 11:34

Pure Storage is conspicuously missing from those benchmarks.

Posted by gus bjorklund on 29-Jan-2018 11:46

they are missing because they claim they aren’t allowed to participate. cuz the benchmark rules do not allow for dedup, compression, etc.

they could turn those off if they wanted.

pure storage marketing is very creative.

Posted by Thomas Mercer-Hursh on 15-Feb-2018 11:53

One of the posters on the thread about Pure which stimulated me to ask about it here has just said:

In addressing network latency, the networks that support storage are either high-bandwidth Ethernet (10GbE and above), or FibreChannel (designed as a very low latency protocol specifically for connecting storage arrays to servers). Network latency, as a rule, is very, very low, measured in nanoseconds (whereas storage latencies are microseconds or milliseconds). Usually, network latency is far less of a performance detractor than the storage media or the application itself.

http://boards.fool.com/tamhas-latency-and-random-access-performance-32985762.aspx

This seems counter to what I have been hearing from the DB experts here.  What do you think?

Posted by Tim Kuehn on 15-Feb-2018 12:02

I think your source got it backwards. :) 

Posted by gus bjorklund on 15-Feb-2018 12:44

you have to measure end-to-end latency of the system, not just devices by themselves. latency of individual devices is only part of the story and does not take into account things like the device drivers and other things in the data path.

and: netapp relies on the NFS protocol for block-level (virtual) device access. i don’t know much about pure since nobody has any test data.

Posted by Thomas Mercer-Hursh on 15-Feb-2018 13:08

Tim, I agree ... I was looking for some ammunition! :)

Gus, good point.

Posted by Tim Kuehn on 15-Feb-2018 13:12

Speed of light would be the big constraining factor -  light travels 0.3 meters in one nanosecond, so for the author to assert network assets are measured in nanoseconds contradicts the laws of physics! :) 

Posted by dbeavon on 15-Feb-2018 13:35

The other day I was trying to get the fastest possible response out of a SQL broker for OE.  My plan is to use this as an indication of whether the database was online or not.

In addition to making a SQL92 database connection, I wanted a query response to be returned from the database broker.  The fastest end-to-end response I could get out of OE was unimpressive (~150 ms) and I only attained this when the client code was running in a very tight loop.   For my query I used "SELECT * FROM sysprogress.syscalctable".  The vast majority of that time was consumed by a *CPU* bottleneck (in "_sqlsrv2" on HP-UX).

I think this could be a reasonable example of how the application software can become a massive bottleneck in an operation that might otherwise be a matter of sending a few network packets back and forth.  I suspect that it was less than 1 ms that was actually spent waiting on network resources, and all the remaining delay was because of a "storage" engine bottleneck. (Granted no actual disk was ever used - that would have added even *more* overhead that was not network related.)

Posted by dbeavon on 15-Feb-2018 14:23

After my previous post, I tried a faster way to get that SELECT statement response back from the database and now my responses come in under 10 ms, which seems much more reasonable.  It kept bothering me that my 150 ms duration was so extremely long -  that was taking even longer than round-trips that use classic state-free appserver.  

I discovered the reason for the delay: the ADO.Net method I was using ("FillSchema" on a data adapter) was doing quite a lot more work on the OE server than when I used another simple method ("ExecuteScalar").

Either way, the network latency appears to be a small proportion of the overhead.  It is obvious that a lot of CPU is being used for these round-trips -- both on the server and on the client.

Posted by ChUIMonster on 15-Feb-2018 14:35

Even without a network to traverse a SQL-92 query is going to go through an awful lot of overhead and through multiple context switches to get executed.  

Did you also connect and disconnect in your measurement?

Either way I'm kind of surprised it was only 150ms.  (HPUX isn't exactly a fast platform these days.)

Software is, indeed, a huge part of the latency.  When someone has a storage device external to their server there are many, many layers of software that handle the data.  If there were not then one nanosecond per foot would probably be a decent measure of latency.  But instead the data moves a few inches, gets processed within the the drive, moves a few more inches, gets processed by an adapter inside the storage array, moves a few more inches, gets processed by the fancy logic  within the array that supposedly magically  makes everything fast, then gets passed to a network adapter... travels for a few feet and runs into a switch which then processes it some more, repeat until we get to the server, then we go through a network adapter and on through to the CPU that asked for that data an eon or two ago.  Oh, most of those layers also probably involve a queue and if your system is busy you probably get to wait in line a lot. And if you are virtualized you can add a few more layers.

Posted by gus bjorklund on 15-Feb-2018 14:52

> On Feb 15, 2018, at 2:14 PM, Tim Kuehn wrote:

>

> for the author to assert network assets are measured in nanoseconds contradicts the laws of physics

not at all. he did not say how many nanoseconds ! obviously a marketroid.

regards,

gus

“less is my favorite editor. too bad it can’t actually edit files.”

Chris Lesniewski-Laas

Posted by gus bjorklund on 15-Feb-2018 14:54

dbdeavon,

if you want to tell if the sql server is online, send it the following:

select * from table_that_does_not_exist;

that’s the most efficient way.

Posted by Tim Kuehn on 15-Feb-2018 14:58

> not at all. he did not say how many nanoseconds

thanks for my chuckle of the day! :) 

This thread is closed