Performace degradation between / /Power8

Posted by njal@99x.no on 09-Dec-2014 15:00

We are in the process of migrating from a power 7 LPAR, Aix 7.1 SP1, OpenEdge 11.3 SP2  ->  Power 8 LPAR, Aix 7.1 SP3,  OpenEdge 11.3 SP3..

We are struggling very much with overall performance. We have a batch (night) job that is taking 4 hours on the old system that is somewhere between 12 and 16 hours on the new.

The new system has a disk-layout optimized for concurrency and has done 100k IOPS with ndisk64 testing (multithreaded), blowing the old system out of the water. The main difference beeing multiLUN/Paths/Filesystems  - the new having more of all.

Single-treaded (single inflight IOs) ndisk64 testing has similar latency/IOPs counts between the platforms.

The new system does not seem to be starving on CPU or memory. (It has more of both than the old)

Is there any hidden OpenEdge best practices for AIX somewhere?

Tips and tricks greatly appreciated!

All Replies

Posted by Paul Koufalis on 09-Dec-2014 15:08

Smells like NUMA.

How many CPUs in the physical box?  LPAR?  What exact model of p8?

What are your DB startup params?

Posted by TheMadDBA on 09-Dec-2014 15:21

It is very easy to get misleading results with ndisk (and other tools like that). For example: reading small files over and over again will most likely read from the AIX or SAN buffer cache instead of actually hitting the disk. This makes the hardware seem much faster than it really is.

There aren't really any magic switches to flip between AIX7 and AIX8.

A few things to check....

0) Does your new LPAR span NUMA zones? This can be a huge problem.

1) Check your Progress startup parameters and DB settings to make sure they are the same.

2) Compare the output of the following commands on both AIX boxes.. looking for differences

ioo -a            (io related parameters)

vmo -a          (memory related parameters)

lsps -a          (paging space)

3)  use nmon and iostat to compare what is going on with memory,cpu and disk during your tests (on both systems)

4) use promon to compare what is happening at a database level (on both systems)

5) check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

How big is your database and each AIX LPAR?

Is this batch job a single Progress session or a collection of processes?

Posted by ChUIMonster on 09-Dec-2014 16:23

Don't forget the all-time classic false performance problem when comparing "old" to "new".  Especially if the apparent problem is confined to a special something.  Like a particular batch job.  Sometimes what is happening is just that the new system has a cold cache.  The numbers that you have from the old system are almost certainly with a hot cache.  It seems silly but I've seen people ready to toss out some very expensive and quite capable hardware because they overlooked that.

Of course it might be other issues (NUMA is a glaring possibility) but it never hurts to double check the obvious.


On 12/9/14, 4:21 PM, TheMadDBA wrote:
Reply by TheMadDBA

It is very easy to get misleading results with ndisk (and other tools like that). For example: reading small files over and over again will most likely read from the AIX or SAN buffer cache instead of actually hitting the disk. This makes the hardware seem much faster than it really is.

There aren't really any magic switches to flip between AIX7 and AIX8.

A few things to check....

0) Does your new LPAR span NUMA zones? This can be a huge problem.

1) Check your Progress startup parameters and DB settings to make sure they are the same.

2) Compare the output of the following commands on both AIX boxes.. looking for differences

ioo -a            (io related parameters)

vmo -a          (memory related parameters)

lsps -a          (paging space)

3)  use nmon and iostat to compare what is going on with memory,cpu and disk during your tests (on both systems)

4) use promon to compare what is happening at a database level (on both systems)

5) check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

How big is your database and each AIX LPAR?

Is this batch job a single Progress session or a collection of processes?

Stop receiving emails on this subject.

Flag this post as spam/abuse.



-- 
Tom Bascom
603 396 4886
tom@greenfieldtech.com

Posted by RussellAdams on 09-Dec-2014 16:42

> Smells like NUMA.

In what way? Progress ran fine on the prior system. There are more hardware threads on the POWER8 (like HyperThreading on Intel) but otherwise they are similar. Also our memory size and system size aren't very large.

> How many CPUs in the physical box?  LPAR?  What exact model of p8?

10 cores in the system. The LPAR in question has 4 dedicated cores. It's an 8284-22A.

> What are your DB startup params?

That I'd have to get from the DBA.

BTW, unrelated to this post. A moderator should place a notice somewhere that to reply or post you have to join a group first. It took a half hour of messing with adblock and noscript to realize it was actually a forum issue that I saw no reply buttons.

Posted by RussellAdams on 09-Dec-2014 16:50

> ndisk misleading results

I agree completely. We understand that benchmarking tools aren't perfect, however with a 50G file and no filesystem cache we get excellent performance using direct IO and CIO. I would expect Progress to accomplish the same.

> There aren't really any magic switches to flip between AIX7 and AIX8.

I agree there aren't any large changes between POWER7 and POWER8, with AIX7. None of my other customers with competing database software have any issues.

> Does your new LPAR span NUMA zones? This can be a huge problem.

I've worked with AIX since RS/6000 and I've never had to investigate NUMA. Where do you see this, and is it documented for the POWER platform?

> Check your Progress startup parameters and DB settings to make sure they are the same.

I understood them to be identical. We also tried with -direct and additional APWs. Still slow.

> Compare the output of the following commands on both AIX boxes.. looking for differences

I'll go one better. Not only did we manually tune, and then try defaults, but we had IBM do a system trace down to the point where the IO is dispatched by the physical HBA. We were sub-millisecond in latency in every layer and IBM's layer 3 kernel team found no problematic configuration in the IO stack.

The conclusion is the application just isn't trying.

> use nmon and iostat to compare what is going on with memory,cpu and disk during your tests (on both systems)

Absolutely. We did. Existing production performs over 3000 IOPS during the job run, new systems do under 1000 IOPS. CPU is modest on existing production, and little to none on the new. We initially thought it was waiting on IO, now I'm not sure it wasn't just sitting there idling inside Progress.

> use promon to compare what is happening at a database level (on both systems)

I'd love to. I'll ask the DBA.

>  check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

Queue depths are not used. In fact, I'd argue we had a QD of 1 the entire time on the new systems. Does Progress do any concurrent read IO at all?

> How big is your database and each AIX LPAR?

DB is approximately 500GB, 4 cores and 100GB RAM in AIX.

SAN is high end EMC with 10 LUNs with interdisk policy striping. Each DB on a dedicated filesystem for DB and logs. Similar configurations at other customers with different DB software excels at IO.

> Is this batch job a single Progress session or a collection of processes?

I believe this job is a serial series of batch jobs, one at a time. I think it's a nightly close and reporting run that executes sequentially.

Posted by RussellAdams on 09-Dec-2014 16:52

> Sometimes what is happening is just that the new system has a cold cache

For 4-12 hours on a system with 4 x 16GB HBAs? I agree the cache can be cold, but I could read the whole 500GB into RAM in a fraction of that time.

> NUMA is a glaring possibility

I'd love to know more about NUMA issues in relation to Progress, especially as it applies to the POWER platform. Can you elaborate?

Posted by James Palmer on 09-Dec-2014 17:17

Numa and progress are like oil and water. My understanding of the architecture and why is hazy so I'll leave that to someone else to explain, but I've seen evidence in practise.
I believe one thing you can try is to disable some of the CPUs and rerun your benchmark and see if it improves. Disabling the processors forces numa to deactivate.

James Palmer | Application Developer
Tel: 01253 785103

[collapse]From: RussellAdams
Sent: ‎09/‎12/‎2014 22:53
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Performace degradation between / /Power8

Reply by RussellAdams

> Sometimes what is happening is just that the new system has a cold cache

For 4-12 hours on a system with 4 x 16GB HBAs? I agree the cache can be cold, but I could read the whole 500GB into RAM in a fraction of that time.

> NUMA is a glaring possibility

I'd love to know more about NUMA issues in relation to Progress, especially as it applies to the POWER platform. Can you elaborate?

Stop receiving emails on this subject.

Flag this post as spam/abuse.




This email has been scanned for email related threats and delivered safely by Mimecast.
For more information please visit http://www.mimecast.com
[/collapse]

Posted by RussellAdams on 09-Dec-2014 18:34

NUMA is a good thought, but I don't believe that applies.

IBM's description of process affinity and memory location (local, near, far) are here:

www.ibm.com/.../local_near_far_memory_part_2_virtual_machine_cpu_memory_lay_out3

Our system shows only one memory and CPU domain, like so:

lssrad

REF1   SRAD        MEM      CPU

0

                0   58409.62      0-15

I think I can say NUMA isn't an issue.

Posted by ChUIMonster on 09-Dec-2014 18:46

If there are truly only 4 cores (out of a total of 10) and they are
truly dedicated then it probably isn't a NUMA problem. Although it
could still be a virtualization issue. What is the CPU entitlement?

But 10 is a very strange total number to have.

Usually these things come in powers of 2.

LPAR configuration can make a big difference. The defaults for AIX are
not friendly to databases. By default AIX makes everything dynamic and
spreads your CPU over as many cores as it can. You have to go way out
of your way to change that. Databases use large shared memory caches
that must be coordinated. That is generally done with mutex locks (aka
"latches") and the process of doing that requires CPU caches to be
synchronized. That is *much* more efficient when the cores are
dedicated and share the same silicon. Which is the exact opposite of
the defaults on NUMA servers (and almost all large servers are NUMA
these days) and in virtualized environments.

A couple of simple commands that might shed some light: "lparstat -i"
and "lssrad -va"

Posted by Paul Koufalis on 09-Dec-2014 18:51

- We know it's not NUMA (single-processor with 10 cores per www-01.ibm.com/.../ssialias).

- We know it's not disk I/O (your disks are doing nothing)

- If it was a memory/swapping issue I'm certain you would have seen it, so let's rule that out.

- You say the server is not CPU starved but is the batch process single threaded?  Even at that, the new cores should be faster than the old cores.

- What's left?  Kernel calls?  Weird nice levels?  New Progress issue with p8?

I had a very similar issue going from p6 to p7 and it turned out to be a UNIX SILENT chmod that was run half a million times.  I'm not saying this is your issue, but it's definitely time to think outside the box.

1. You said you think it's a whole string of jobs run one-after-the-other.  Find out how long EACH one takes on the old box vs. the new box.  That way we can see if it's a generalized issue or one particular job that is misbehaving.

2. Get some DB stats.  Download protop (dbappraise.com/protop.html) and use it to see what's going on.  ProTop is much more information-dense than promon. Are these read-intensive or write-intensive batch jobs?

3. Triple-check the DB startup parameters.  This could very well be an "oops!" moment.  Don't forget BI block size and cluster size.

4. Truss the processes and see if they are doing anything interesting at the kernel level.  IBM has a post-truss cruncher that chews up the output and spits out a nice report.  That's how we saw the UNIX SILENT issue: abnormally high fork()'s .

5. Are you running OpenEdge Replication too?

6. Did you dump and load going from the old box to the new?  Or make any changes to the DB like storage area stuff?

Posted by Rob Fitzpatrick on 09-Dec-2014 19:04

I'd also run this batch client with -yx.  At the end of the batch run, look at proc.mon to see which procedures register the most execution time.  If there is an outlying value that could lead you to a code-related issue like the one Paul described with the repeated chmods.

Also, double-check the other client startup parameters and see if something obvious is missing.

  • Where does the batch client run?  Is it self-service or shared-memory?
  • Where does the code reside relative to the client?
  • Is it r-code or procedure libraries?  Or compile on-the-fly?
  • Is the client using -q?
  • Where do the client's temp files reside (-T), and what is their size and I/O during execution?
  • How many databases does the client connect to?

It would be helpful to know all of the client and broker startup parameters, including those in parameter files.

Posted by RussellAdams on 09-Dec-2014 19:06

[quote user="ChUIMonster"]If there are truly only 4 cores (out of a total of 10) and they are
truly dedicated then it probably isn't a NUMA problem. Although it
could still be a virtualization issue. What is the CPU entitlement?

But 10 is a very strange total number to have.

Usually these things come in powers of 2.

[/quote]

10 CPU, 256GB RAM.


[quote user="ChUIMonster"]LPAR configuration can make a big difference. The defaults for AIX are
not friendly to databases. By default AIX makes everything dynamic and
spreads your CPU over as many cores as it can. You have to go way out
of your way to change that. Databases use large shared memory caches
that must be coordinated. That is generally done with mutex locks (aka
"latches") and the process of doing that requires CPU caches to be
synchronized. That is *much* more efficient when the cores are
dedicated and share the same silicon. Which is the exact opposite of
the defaults on NUMA servers (and almost all large servers are NUMA
these days) and in virtualized environments.

A couple of simple commands that might shed some light: "lparstat -i"
and "lssrad -va"

[/quote]

I already posted lssrad, and it looks like one piece. The LPAR is a 4 core dedicated LPAR, we aren't using shared procs (yes, i know it's still shared under the hood).

Processor folding was introduced in late AIX 5.3 to compensate for POWER's VCPU dilution problem. We have all cores folded until the first core exceeds a busy threshold, and that's set appropriately.

We are seeing really very low CPU utilization (ie: <10%), and reading the spin lock documentation I take it if we were waiting in spin I would see higher CPU?

Posted by RussellAdams on 09-Dec-2014 19:17

[quote user="Paul Koufalis"]
I had a very similar issue going from p6 to p7 and it turned out to be
a UNIX SILENT chmod that was run half a million times.  I'm not saying
this is your issue, but it's definitely time to think outside the box.
[/quote]

Was this a cron job making changes? That would show up as high disk
busy due to the IO in the inode table.

[quote user="Paul Koufalis"]
1. You said you think it's a whole string of jobs run
   one-after-the-other.  Find out how long EACH one takes on the old
   box vs. the new box.  That way we can see if it's a generalized
   issue or one particular job that is misbehaving.

2. Get some DB stats.  Download protop (dbappraise.com/protop.html)
   and use it to see what's going on.  ProTop is much more
   information-dense than promon. Are these read-intensive or
   write-intensive batch jobs?

3. Triple-check the DB startup parameters.  This could very well be an
   "oops!" moment.  Don't forget BI block size and cluster size.
[/quote]

I'll send that to our DBA to check.

[quote user="Paul Koufalis"]
4. Truss the processes and see if they are doing anything interesting
   at the kernel level.  IBM has a post-truss cruncher that chews up
   the output and spits out a nice report.  That's how we saw the UNIX
   SILENT issue: abnormally high fork()'s .
[/quote]

I intend to. The IBM team has trussed our ndisk, but not the main app.

[quote user="Paul Koufalis"]
5. Are you running OpenEdge Replication too?
[/quote]

No.

[quote user="Paul Koufalis"]
6. Did you dump and load going from the old box to the new?  Or make
   any changes to the DB like storage area stuff?
[/quote]

We reorganized to add many more disks. We have much more IO capacity
now, and dedicated filesystems per DB where they were shared before.

Posted by Paul Koufalis on 09-Dec-2014 19:32

Yes it was a cron job running MRP via some _progres $DB -p batch.p etc etc... We trussed the _progres and saw the high fork()'s .  We  did NOT see abnormally high disk I/O in nmon or iostat.  Or if you prefer, the disk I/O we saw seemed consistent with the job.

The next step is really to get some DB and application stats as suggested by ChUI and Rob.

Last point: you're likely one of the first to migrate to the P8 (it's only been out for a few months) so this could be a real Progress issue.  I'm having a hard time believing this even as I write it, but it's possible.

Posted by Rob Fitzpatrick on 09-Dec-2014 19:41

It would be a good idea at this point to open a case with Progress TS (or have your vendor do it, if you're an indirect customer).  They'll likely ask much the same questions we have, but they may be able to offer some good insight as well.  

And if this is an emerging platform-compatibility or behaviour-change issue, the techs may already be aware of the issue and a workaround or fix.

BTW you said you're in the process of migrating.  Does that mean the old box is still prod and the new one is test?  Or is the new one now prod?

Posted by RussellAdams on 09-Dec-2014 19:46

I just ran a truss, and I'm seeing very high calls to statx looking for files that don't exist. I'll have to ask them what that's about. Topas has high Namei calls, but low disk throughput.

Thanks for the truss suggestion.

Posted by RussellAdams on 09-Dec-2014 19:47

We will be opening a case with OpenEdge.

These new systems are still in testing.

Posted by Rob Fitzpatrick on 09-Dec-2014 19:53

The high number of stat calls for non-existent files sounds like propath searching.  It could be normal for your application.  It would be helpful to compare those numbers against prod, or a prod-like environment.

Can you see the file names in your trace?  If so, is it a lot of files with .p or .r extensions?

Posted by RussellAdams on 09-Dec-2014 19:54

Yes, most .p and .r's. 160000 calls in approximately a minute.

Posted by Rob Fitzpatrick on 09-Dec-2014 19:55

Are they all local paths?

Posted by RussellAdams on 09-Dec-2014 19:58

Yes, all local disk.

Posted by Rob Fitzpatrick on 09-Dec-2014 20:01

Compare the propaths between the batch clients on the old and new systems.  If they are different the client may be spending more time on the new box searching code directories instead of doing useful work.

Posted by Rob Fitzpatrick on 09-Dec-2014 20:24

In this test environment are there other clients connected apart from the batch client you mentioned?  

One possibility that is consistent with symptoms of a slow application and no apparent system-level bottlenecks is record lock contention.

Example:

Client A obtains an exclusive lock on record 1 in table X.  Client B (your batch client) attempts to obtain a lock on that same record and can't; its request is queued.  Depending on how the batch client's code was written (e.g. whether it specifies NO-WAIT on the query), client B may block and do nothing until one of two things happens.  Either client A releases the lock and client B obtains it and continues processing, or client A retains the lock until client B's lock wait timeout expires (30 minutes by default).

I think this is a pretty unlikely scenario.  If this was your issue you would expect to see similar contention in prod.  If anything, this problem would be worse in prod than in test due to (probably) greater user count and activity.  But it's a possibility.  A client in that state would show up under "blocked clients" in promon or ProTop.  You would also see record waits for that client in promon R&D 3 3 (lock requests by user).  If there is a lock wait timeout you would see an (8812) error in the client's client log and in the database log.

Another possibility is that the client is blocking on a network I/O.  I have seen ABL client performance nosedive when it is attempting reads or writes on an unresponsive or unreliable NFS share (or disk).

Posted by RussellAdams on 09-Dec-2014 21:01

Great idea to test.

Posted by Libor Laubacher on 10-Dec-2014 03:34

Hi Russell,
 
I see you are still chasing this ‘old chestnut’.

Ø  Yes, most .p and .r's. 160000 calls in approximately a minute.

Make sure you have –q parameter for the client(s).

Something to consider would be to put .r into .pl files, possibly even memory mapped .pl files.

/LL

Posted by Richard Banville on 10-Dec-2014 08:17

Is it possible that there is a difference between startup parameters for the 2 runs – most notably the –q parameter?  Or that the PROPATH is different between the 2 runs?
 
With so many statx() system calls, I would think that the  ABL’s PVM is validating .p or .r location more often.
 
 Oops - didn't see Libor's or Rob's posts.  I agree with them!
 
 
[collapse]
From: Rob Fitzpatrick [mailto:bounce-robfsit@community.progress.com]
Sent: Tuesday, December 09, 2014 8:53 PM
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Performace degradation between / /Power8
 
Reply by Rob Fitzpatrick

The high number of stat calls for non-existent files sounds like propath searching.  It could be normal for your application.  It would be helpful to compare those numbers against prod, or a prod-like environment.

Can you see the file names in your trace?  If so, is it a lot of files with .p or .r extensions?

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by Peter Judge on 10-Dec-2014 08:37

>BTW, unrelated to this post. A moderator should place a notice somewhere that to reply or post you have to join a group first. It took a half hour of messing with
>adblock and noscript to realize it was actually a forum issue that I saw no reply buttons.

Will look into it.

-- peter

 
 
[collapse]
From: RussellAdams [mailto:bounce-RussellAdams@community.progress.com]
Sent: Tuesday, 09 December, 2014 17:42
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Performace degradation between / /Power8
 
Reply by RussellAdams

> Smells like NUMA.

In what way? Progress ran fine on the prior system. There are more hardware threads on the POWER8 (like HyperThreading on Intel) but otherwise they are similar. Also our memory size and system size aren't very large.

> How many CPUs in the physical box?  LPAR?  What exact model of p8?

10 cores in the system. The LPAR in question has 4 dedicated cores. It's an 8284-22A.

> What are your DB startup params?

That I'd have to get from the DBA.

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by TheMadDBA on 10-Dec-2014 09:55

>>  check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

>Queue depths are not used. In fact, I'd argue we had a QD of 1 the entire time on the new systems. Does Progress do  

>any concurrent read IO at all?

When you say queue depths are not used do you mean iostat doesn't show any queue waits? queue_depth controls how many IOs can be requested concurrently for that logical disk.

Sounds like you have checked all of the obvious stuff though, which wasn't really apparent from the OP :)

Posted by RussellAdams on 10-Dec-2014 10:10

There is no stress, unusual latency, high disk busy, or queue saturation on the IO subsystem at all.

The high statx calls are likely to have been there all along, so that's not new but a potential refinement.

I'm looking forward to what OpenEdge says.

Posted by RussellAdams on 15-Dec-2014 11:13

Regarding IO, I see frequent discussion of write behavior in the documentation (direct IO, sync, buffer flush timing, etc). What about reads? Does Progress support any kind of concurrent or asynchronous reads?

Posted by George Potemkin on 16-Dec-2014 03:37

> Does Progress support any kind of concurrent or asynchronous reads?

No

Posted by RussellAdams on 02-Jan-2015 11:48

Another truss has found significant time in an OS "__semop" function. Sounds like maybe it's waiting on semaphores?

Posted by TheMadDBA on 02-Jan-2015 12:02

Did you ever get any information from the DBA side? Progress does use semaphores for shared memory functions.

The really interesting things are going to be in promon. Did you open a call with PSC?

Posted by RussellAdams on 02-Jan-2015 12:16

We have an open call and the DBAs are working with them. Slow for the holidays. Just researching further.

Posted by George Potemkin on 05-Jan-2015 02:48

Did you notice that promon hangs for some time while connecting database or while updating statistics on /Activity/ screens?

Processes that are sleeping on __semop() call should be reported in promon/R&D/1/4/2. Status: Blocked Clients.

I would also check promon/R&D/debghb/6/15. Status: Buffer Lock Queue

Posted by RussellAdams on 16-Mar-2015 08:57

Is there any kind of readahead at all? It looks like it requests a 8k block from disk, and waits until it is returned for every read.

Posted by Robert Lee on 18-Mar-2015 11:15

OpenEdge does nothing to specifically take advantage of “readahead” or “concurrent I/O”.

Posted by RussellAdams on 14-Apr-2015 16:43

To give some feedback if anyone else has similar issues, I can confirm that the issues we are experiencing were not related to NUMA, the IBM AIX platform, or our IO layer. We are troubleshooting a long duration single threaded job and we are read latency bound.

The lack of any kind of asynchronous reads and the single threaded nature of the job meant we only ever had one I/O outstanding at a time. Though we are throwing SSD at the problem to reduce the read latency, I believe this does not solve the underlying issue of the IO model. SSD only provides some relief for the symptoms. Splitting up the job may also be an option.

I'm surprised the topic of NUMA came up before the single threading of IO.

I specialize in AIX not Progress, so please interpret this as constructive criticism. I've never seen a database on the AIX platform that did not support some form of async I/O, whether provided by an OS library or internal to the application.  We ultimately have too large of a system and SAN for this application, because the database cannot effectively utilize the hardware.

A proper analogy would be that I am providing a fleet of hundreds of long haul semi trucks, but I'm shipping a single can of soup at a time in one truck while the rest idle at the dock. That a very poor ROI on the rest of the fleet.

I find Progress' programming language and Text UI (hurrah!) interface appealing, but I now doubt the scalability of the backend database. Please correct me if I'm wrong, but I don't expect this to start a discussion. I wanted to ensure that I could post the issue was solved for future reference.

Posted by Paul Koufalis on 14-Apr-2015 19:35

If I understand your last post, you have an _progres process that is doing massive database reads and none (or few) of those blocks are cached in the DB buffer pool. Therefore you are read latency bound: like a backup, you are reading a block from disk and never using it again in it's lifetime in the buffer pool. Am I interpreting this correctly?

What are your DB start-up params? How much I/O (logical and physical) are you doing? If you watch the batch job(s) in real time with ProTop then you'll see fine grained stats right at the individual _progres process level including logical (table and index usage) and physical. Something smells bad here.

If my understanding is correct then there should be some tuning opportunities at the database or query level. I re-read the entire thread and what struck me is that the task took much less time on the old hardware (4h vs. 12-16h). You did not answer if a dump and load of the DB was done during the migration. Was it? You also seemed said that it was a string of batch jobs and I had asked if all these jobs were 3-4X slower or if maybe it was just a subset of jobs that were adding all the extra time. It was also not clear if the the DBA confirmed that all the DB start-up parameters are the same as before. What is the DB block size?

I guess I'm saying that you shouldn't need asynchronous reads and that, like SSDs, AIO would just mask the symptoms rather than address the core issue(s).

Posted by Tim Kuehn on 14-Apr-2015 19:42

If he went from running the jobs in sequence to running the jobs all at once, that could be a big enough change to cause -B thrashing and explain the run-time difference.

Also, there have been changes in parameter behavior between versions such that using the same parameter value in both versions can result in a negative performance impact.

Posted by Rob Fitzpatrick on 14-Apr-2015 20:03

I wouldn't write off OpenEdge as lacking scalability just yet; that isn't the only reasonable conclusion based on the posts in this thread.  You may be running into an OE architectural limit; but you may have a configuration issue.  

We have no idea how your client application or DB are configured/structured.  We never did get your client or broker startup parameters from prod and test.

Posted by RussellAdams on 14-Apr-2015 21:59

@Paul: I entirely agree there are tuning opportunities at the database level. The old hardware was benefiting from SAN technology which moved disk hot spots to SSD, and the new system though on the same SAN did not yet have the "data temperature" to move those same hot spots up.

As the systems engineer, I can't speak to all the database portions. Let me just say that our team had IBM and EMC review these systems at length for a problem that was at the application layer.

We have since used a Progress consultant to help review the Progress side for tuning and after putting all SSD storage into service we hope to see significantly better performance.

This thread was begun with the problem being a potential issue with the new platform, and I just wanted to close that with some feedback.

I will respectfully disagree regarding AIO. The SAN can perform over 40,000 IOPS without any significant increase in latency, we had to benchmark it extensively during our troubleshooting. I primarily work with Oracle and Intersystems Cache customers on AIX, and I've never had a database that couldn't make use of the storage bandwidth provided.

There are tuning opportunities at the application, job, and database level, yet I don't believe further tuning will be able to leverage all the storage bandwidth this platform provides.

Posted by RussellAdams on 14-Apr-2015 22:00

@Tim (sorry, quoting is different in every forum):  the differences were minimized during the testing. We restored the DB and ran the same jobs between the two systems. Only one service pack level different in the Progress software as I understand it.

Posted by RussellAdams on 14-Apr-2015 22:06

@Rob: I certainly didn't say it's written off. I said I have concerns about the scalability of the DB level without asynchronous IO. My goal is to provide only some productive feedback and my opinions based on our troubleshooting of the issue. I am optimistic we will see performance improvements with the new tuning parameters and SSDs.

From my perspective at the system layer, we just threw hardware at a software issue (SSD vs IO patterns). I have very large customers on other AIX systems with databases that dwarf this one, and they get excellent IO performance on similarly configured high class storage. This was unusual that such a small (<1TB) database was having such difficulty on a new system and it caused significant difficulty to everyone involved to isolate the cause.

I appreciate everyone's feedback while we were troubleshooting, this forum was a good place to get ideas.

Posted by ChUIMonster on 15-Apr-2015 07:31

I don't recall your ever posting what the actual throughput is.

How many record reads per second does this single threaded process get
before it tops out?

On modern AIX a single threaded job /should/ be able to read around
200,000 records per second on a well configured system that is not IO
bound. Obviously an IO bound process will not do nearly that well. But
if you have SSDs that can deliver tends of thousands of IO Ops per
second you shouldn't be crazily off the mark.

Posted by RussellAdams on 15-Apr-2015 08:48

@Tom: The throughput was very low (<30 MB/s) with a 3 ms response time between 8k reads. I don't have the database statistics.

My point is that we have 8 database data LUNs that are striped with a queue depth of 128 each, so we can support 1024 queued IOs simultaneously. There are four 16Gbps HBA ports to the SAN for a total real bandwidth of 6.4 GB/s and a command depth of 1024 per port for 4096 simultaneous IOs. Yet only a single IO is ever dispatched by the process at a time.

Posted by Tim Kuehn on 15-Apr-2015 09:11

[quote user="RussellAdams"]

@Tom: The throughput was very low (<30 MB/s) with a 3 ms response time between 8k reads. I don't have the database statistics.

[/quote]

30MB/s? Either the DB is seriously mis-configured or there's a problem with the storage system configuration. 

Before you continue on about asynch I/O, you need to post some specifics here - specifically the DB Version along with the DB server and client startup parameters. 

If you don't know where those are, look in the db.lg file and get the list of entries from the last time the server was started. 

Posted by ChUIMonster on 15-Apr-2015 09:59

I think that I got your point. And, yes, that is true of Progress. It
asks for IO one block at a time when it needs data from disk. Progress
doesn't ask for data when it doesn't need any. And it doesn't try to
guess what it will need next.

I was under the impression that those fancy SANs had read-ahead features
that will detect a sequential access pattern automatically (many of your
posts seem very focused on the sequential access capabilities of the
system). Of course if the IO is random that won't help at all. And
might even hurt. Did the IBM and EMC engineers look at how random the
data requests are? I've been involved in some engagements where they
took the time to look at that and were quite surprised.

The open question, in my mind, is -- is the Progress DB blocked and
waiting on IO? Or is it asking for IO at a leisurely rate because the
application is "doing stuff" (executing business logic). If the
Progress DB is blocked on IO there should be some evidence of that.
Reading through the thread there are a lot of reasons to think that it
is more a case of "the application is doing stuff".

It just isn't clear (to me) if the bottleneck is actually read IO or
not. It sounds like when an IO is requested it happens quickly. I am
hearing that it just isn't requested as fast as the system /could/ (in
theory anyway) provide it. Which suggests to me that read IO is not the
bottleneck.


The application profiler is a really good way to find out where the
application is spending time. If it is spending time doing things other
than IO ops then it really doesn't matter very much how fast your SAN
is. Has anyone given the profiler a try?



But assuming the application is actually only trying to read data as
fast as it possibly can (in a single thread) and is not actually doing
much of anything with that data there are a few more nuggets that can
maybe be teased out:

30MB/sec with 3ms between reads sounds like about 375 IO ops per
second. That isn't very impressive. I'm sure that your SAN can do
better than that. Unless it happens to be just a single spindle of
rotating rust -- in which case it is doing about as many IO ops as it
can be reasonably expected to do.

If the 8K blocks have a 100 records each (I'm making that up -- less is
more likely) then, at best, you are getting 37,500 records per second.
Which is not very good so I can understand being disappointed with
performance.

If the data in those records is NOT in the logical order that the
application needs then the effectiveness of each IO operation isn't
going to be very good and the useful throughput will be lower. Possibly
much lower. (This is where a dump & load might help if it reorders the
records to better fit the usage -- conversely a dump & load that ordered
them badly will *hurt* performance for this purpose...)

That all assumes that index blocks are staying in memory. Unless this
is a fairly new OpenEdge release with cutting-edge code every record
find requires both index block and data block reads. If index blocks
are also being read from disk you lose even more of your throughput.


But I really think profiling the app would be a lot more likely to
identify the source of the problem.

Posted by Thomas Mercer-Hursh on 15-Apr-2015 10:27

You say that you have had a Progress consultant.  Is this someone from Professional Services or one of the independent consultants?

Posted by S33 on 15-Apr-2015 13:01

[quote user="ChUIMonster"]

This is where a dump & load might help if it reorders the
records to better fit the usage -- conversely a dump & load that ordered
them badly will *hurt* performance for this purpose...

[/quote]


I love that train of thought. Just sayin'

Posted by DLC1984 on 16-Apr-2015 18:48

The consultant was Dan Foreman...unfortunately I don't have time at the moment to consume the entire contents of this thread....but I can say that I think we found the problem (after 50 hours of travel for 32 hours of onsite consulting)...the app in question generates dunning letters into a directory (on a NAS) that already has 250k+ letters sitting there....simple unix commands like find & grep & ls are, at times, very very slow when "looking" into this directory....the DB itself has zero issues (that I can find) with latch contention, semaphore contention, locking contention, checkpoints, buffer cache hit ratio, et al....having said that the DB has not been D&L'd in more than 10 years...so fragmentation is really bad....but simple tests like bigrow show a really fast piece of kit...as the Brits say...

[collapse]
On Wed, Apr 15, 2015 at 8:27 AM, Thomas Mercer-Hursh <bounce-tamhas@community.progress.com> wrote:
Reply by Thomas Mercer-Hursh

You say that you have had a Progress consultant.  Is this someone from Professional Services or one of the independent consultants?

Stop receiving emails on this subject.

Flag this post as spam/abuse.


[/collapse]

Posted by ChUIMonster on 17-Apr-2015 10:39

So it boils down to the db isn't busy and IO is being requested at a leisurely rate because the app was "doing stuff".

So there is both a NAS and a SAN in this setup?  That NAS wouldn't happen to be a "filer" would it?

This thread is closed