PASOE Performance Issue with NEWING class

Posted by Roger Blanchard on 28-Sep-2019 16:02

We just ran into an issue with a customer that just migrated from 11.4 Classic Appserver to 11.7.5 PASOE. The system was just running slow......

We narrowed it down to backend logic.

We then narrowed it down to our logging mechanism.

We then narrowed it down to NEW ClassName ().

It appears that NEWING a class in PASOE is 22x slower than doing the same in a GUI client (WebClient)

I just threw together a quick repro. The WriteMessage METHOD has no code in it so nothing is being done.

On GUI Client;

The following takes 1312ms when just passing a CHAR

DO iCnt = 1 TO 1000000:

LogManager:WriteMessage("Test1",

"ORS_APP" // we only want this to be logged if ORS_APP is enabled which is old logging

END.

The following takes 5268ms so about 4x slower

DO iCnt = 1 TO 1000000 ON ERROR UNDO, THROW:

LogManager:WriteMessage("Test1",

NEW MessageSubSystem ("ORS_APP") // we only want this to be logged if ORS_APP is enabled which is old logging

END.

On PASOE server;

The following takes 1531ms when just passing a CHAR.

DO iCnt = 1 TO 1000000:

LogManager:WriteMessage("Test1",

"ORS_APP" // we only want this to be logged if ORS_APP is enabled which is old logging

END.

The following takes 120324ms so 22x slower than the same code on the GUI client.

DO iCnt = 1 TO 1000000 ON ERROR UNDO, THROW:

LogManager:WriteMessage("Test1",

NEW MessageSubSystem ("ORS_APP") // we only want this to be logged if ORS_APP is enabled which is old logging

END.

Is this expected? I would hope not.

OpenEdge Development - Forum

All Replies

Posted by Mike Fechner on 28-Sep-2019 18:06

What values for -D and -reusableObjects are you using?

Sent from Nine

Von: Roger Blanchard <bounce-rblanchard@community.progress.com>
Gesendet: Samstag, 28. September 2019 18:03
An: TU.OE.Development@community.progress.com
Betreff: [Technical Users - OE Development] PASOE Performance Issue with NEWING class

Update from Progress Community

Roger Blanchard

We just ran into an issue with a customer that just migrated from 11.4 Classic Appserver to 11.7.5 PASOE. The system was just running slow......

We narrowed it down to backend logic.

We then narrowed it down to our logging mechanism.

We then narrowed it down to NEW ClassName ().

It appears that NEWING a class in PASOE is 22x slower than doing the same in a GUI client (WebClient)

I just threw together a quick repro. The WriteMessage METHOD has no code in it so nothing is being done.

On GUI Client;

The following takes 1312ms when just passing a CHAR

DO iCnt = 1 TO 1000000:

LogManager:WriteMessage("Test1",

"ORS_APP" // we only want this to be logged if ORS_APP is enabled which is old logging

END.

The following takes 5268ms so about 4x slower

DO iCnt = 1 TO 1000000 ON ERROR UNDO, THROW:

LogManager:WriteMessage("Test1",

NEW MessageSubSystem ("ORS_APP") // we only want this to be logged if ORS_APP is enabled which is old logging

END.

On PASOE server;

The following takes 1531ms when just passing a CHAR.

DO iCnt = 1 TO 1000000:

LogManager:WriteMessage("Test1",

"ORS_APP" // we only want this to be logged if ORS_APP is enabled which is old logging

END.

The following takes 120324ms so 22x slower than the same code on the GUI client.

DO iCnt = 1 TO 1000000 ON ERROR UNDO, THROW:

LogManager:WriteMessage("Test1",

NEW MessageSubSystem ("ORS_APP") // we only want this to be logged if ORS_APP is enabled which is old logging

END.

Is this expected? I would hope not.

View online

You received this notification because you subscribed to the forum. To stop receiving updates from only this thread, go here.

Flag this post as spam/abuse.

Posted by Roger Blanchard on 28-Sep-2019 19:41

We are using ;

-D 100000

-reusableObjects 100000

Posted by rblanchard@ospreyretail.com on 28-Sep-2019 19:42

I am using

-D 100000

-reusableObjects 100000

Roger Blanchard

Osprey Retail Systems

Posted by dbeavon on 28-Sep-2019 22:07

Can you test the timing *without* the "LogManager" operation? Ie. just test the NEW'ing.

Nevermind, I didn't see this part:

...The WriteMessage METHOD has no code in it so nothing is being done....

Posted by Roger Blanchard on 28-Sep-2019 23:02

Yep, no code at all in that method. I just ran the code with just the NEW and had the same results.

Posted by tim on 29-Sep-2019 03:24

No, this is not expected.

We've seen some poor numbers comparing PASOE AVM to the single-threaded Client AVM when running non-db code). We expect some slight degradation, but nothing like this.

Did you run your test with pre-compiled code and running with -q?

Could you email me (tim@progress.com) your exact repro?

Posted by dbeavon on 29-Sep-2019 21:07

>> Did you run your test with pre-compiled code and running with -q?

Exactly what I was thinking. It sounds like Roger had already spent lots of time on this, but it is worth going back to this question just in case.

The fact that the repro has NOT eliminated the method call (LogManager:WriteMessage) yet is especially suspicious to me ... given that we are pointing the finger at the NEW'ing of a class. We should also be looking at the performance numbers without that method call.

At this many iterations (1000000) the performance issues in the AVM can really add up, especially if you don't have precompiled code and aren't caching it in memory with -q.

Without -q you are at the mercy of the file system resource where the code is hosted. If it is a spinning disk or a NAS file system then you will see a large performance degradation because the files are constantly polled for changes.

Posted by rblanchard@ospreyretail.com on 29-Sep-2019 22:34

It is precompiled and I am using -q.

We are also using SSD in my lab as well as at customers site.

Roger Blanchard

(O) 508-992-1097

(M) 508-415-9982

Sent from my mobile device

Posted by rblanchard@ospreyretail.com on 29-Sep-2019 22:52

I did also eliminate method call with same results.

Roger Blanchard

(O) 508-992-1097

(M) 508-415-9982

Sent from my mobile device

Posted by dbeavon on 29-Sep-2019 23:34

I'm stumped. It seems like you've narrowed down the scope of the problem to something that is entirely a CPU bottleneck.

If you watch your system resources, can you verify that this is running with 100% CPU utilization (or at least it uses the single-threaded equivalent of 100% CPU)? If not, then there are other unknown factors and resources that may be participating in your repro.

Also is there only *one* PASOE session that is active in the agent while you are running your repro? If there are any other sessions that are active within the agent process then there might be concurrency-related bottlenecks that are slowing things down, (eg I'm thinking of things like the artificial bottleneck that prevents more than one thread in a process from using XML data at a given moment of time ... see

https://community.progress.com/community_groups/openedge_general/f/26/t/57795 )

Or perhaps your new PASOE server has a virtualization problem that might be short-changing you on CPU. Maybe you should run both versions of your tests on the same hardware - this will take the hardware and virtualization out of the equation.

It would seem odd to me if PASOE was the only thing that was responsible for the slowdown. I'd guess the upgrade in OpenEdge versions would be more likely to cause this performance regression. The hosting of the code into a PASOE msagent process should not normally make any massive difference.

Posted by frank.meulblok on 30-Sep-2019 08:14

Do you see the same difference if you explicitly tracking and delete the object instances you NEW() ?

Would be something to test, as there's also garbage collection and how many objects stick in memory that can play a factor.

NEW()ing a new instance probably gets slower if you have larger numbers of objects in memory already. (It used to in the past, haven't benchmarked recently).

If the garbage collector behaves differently, you may end up with objects living longer, influencing that.

Or maybe the GC has some other issue why it's running slower in the PAS agent.

Posted by Roger Blanchard on 30-Sep-2019 11:01

>> If you watch your system resources, can you verify that this is running with 100% CPU utilization (or at least it uses the single-threaded equivalent of 100% CPU)? If not, then there are other unknown factors and resources that may be participating in your repro.

It only uses 25% of CPU.

>> Also is there only *one* PASOE session that is active in the agent while you are running your repro?

Our tests were when NO other sessions were active. On the live site we did this at 2:00am when the system was not being used. We ran the tests after a reboot.

In our office using the customers DB we only had one session active.

>> Or perhaps your new PASOE server has a virtualization problem that might be short-changing you on CPU. Maybe you should run both versions of your tests on the same hardware - this will take the hardware and virtualization out of the equation.

I ran the same test not using a VM with same results...22x slower.

Posted by dbeavon on 30-Sep-2019 13:19

>> It only uses 25% of CPU.

You can compare this to the number of cores. If you have four cores, then you are essentially running a thread with exactly 100% usage of a single core. It is not really possible for your ABL session's code to use more CPU than that.

You didn't say how much time it took to run your test after removing the method call. Did you at least save about 1 second (ie ~1531ms ?)

Since this is ABL we're talking about, I know that 1,000,000 iterations of a method invocation will not be free. ;)

The good news is that you already have something that you can package up and send to Progress tech support. I would be very eager to get that performance hotfix too, if one becomes available. Please keep us in the loop.

Posted by Roger Blanchard on 30-Sep-2019 13:26

I will keep you in the loop. We are still working on this as new information was brought to our intention early this morning.

Thanks for all the input.

Posted by tim on 30-Sep-2019 21:21

To give an update, [mention:011caad85a5545659f1a1aaf22be3887:e9ed411860ed4f2ba0265705b8793d05] sent me his reproducible. I'm seeing a degradation, but nothing like the 4x or 22x severity, so something else must be going on. Will look forward to an update from Roger if he gets to the bottom of it.

The degradation I was able to measure is one that we've recently become aware of for CPU intensive operations (e.g., DO i = 1 to 1000000: userdeffunc(). END.).

We've identified the root cause and are working to close the performance gap between the single threaded AVM client and the multi-threaded PASOE AVM.

Posted by Roger Blanchard on 07-Oct-2019 10:48

Well, it turns out we have a few issues. One issue was definitely the -q. Once I was able to get into the customers production server I quickly found out it was enabled on the client but not the server. Thanks Tim and David. I am still not sure why we could not see the same when toggling the -q in our office but that's another story.

The other issue we are experiencing is the MSAS agent is actually crashing with no error. We are working with tech support on this issue.

Thanks.

Posted by dbeavon on 07-Oct-2019 16:13

>> MSAS agent is actually crashing with no error

If you are running on Windows, then I'd recommend getting on a recent SP if possible (I've been fairly happy with OE 11.7.4 and 11.7.5).

Also on windows - remember to create anti-virus (defender) exclusions, and disable "data execution prevention" (DEP). In theory Progress OE is supposed to be DEP-compliant but I had a case where PASOE was crashing in a fairly recent version of OE 11 and it was because DEP was still enabled.

Checking DEP was one of the first things that tech support had asked to do. Once we disabled DEP the crashing stopped.

See: knowledgebase.progress.com/.../P119884

Posted by frank.meulblok on 08-Oct-2019 08:12

Small note about the DEP - if you're using 3rd-party components, make sure those are DEP compliant as well. If they're not, those components are likely to originate the crashes.

Anything that uses any form of native code (Com/ActiveX, direct DLL calls) in particular should be considered suspect until verified.

Posted by Roger Blanchard on 09-Oct-2019 11:38

>> If you are running on Windows, then I'd recommend getting on a recent SP if possible (I've been fairly happy with OE 11.7.4 and 11.7.5).

We are running 11.7.5,

>> Also on windows - remember to create anti-virus (defender) exclusions, and disable "data execution prevention" (DEP).

We just did that yesterday and will see what happens.

>> Small note about the DEP - if you're using 3rd-party components, make sure those are DEP compliant as well. If they're not, those components are likely to originate the crashes.

The crashing is upon session startup but not all the time. It is typically happening as users are attempting to login and another session is attempting to start.

Posted by dbeavon on 09-Oct-2019 14:31

If it is crashing then you may have event logs (application log in windows) to back that up. Windows should detect that the _mproapsv has crashed and write a message to application log. At least that will keep track of how frequently this happens. But despite the crash, PASOE should recover and start a new msagent (_mproapsv) to replace the one that crashed. Hopefully your customer can attempt to repeat the PASOE request if they see an unusual error message.

And you should also have protrace files that Progress can examine. Sometimes those files can be sufficient to identify the problem, if you give them to tech support (... the ABL stack and/or the native stack trace can be helpful clues).

The _mproapsv shouldn't crash ... we still have some issues with leaks, and sometimes with concurrency restrictions in the process, but never any unusual crashes (or at least none that would happen in _mproapsv and not also in _progres). We've been running PASOE for about a year now. The only times I recall any surprising crashes were when my r-code was stale (just delete all the old stuff and replace it with freshly compiled code), and when DEP was enabled.

Posted by gus bjorklund on 13-Jan-2020 22:14

hypothesis, entirely without evidence:

perhaps some activex or other component is being loaded into the progress session concurrently by multiple users but is not actually

multi-user enabled.

This thread is closed