Performance overhead when using ABL for business code.

Posted by dbeavon on 16-Mar-2017 15:41

I was playing with a slow appserver routine today (in ABL) and noticed a number of performance issues. While tweaking the code, I was able to make an impact on the overall duration of a round-trip to state-free appserver.

We had a loop of about *** 3000 records *** that were being returned to the client (from a database buffer into a dataset and out). The best case scenario was able to fetch them in about 100 ms. And, depending on how I changed the organization of the code, it went up to about 450-500 ms.

Firstly, the dataset records were initially populated from the database buffer using about 60 "assign" statements (one per field coming out of the database buffer). In comparison, the breaking down of these assignments into just 5 assign statements (times the 3000 records) saved a little over 50 ms. Yes, that is still a thing.

Another thing I was doing was capturing the MD5 hash of the entire raw buffer for optimistic locking (concurrency) purposes. I figured that if I had the buffer in memory anyway, it would be appropriate to do a quick hash of it, for the off-chance someone may want to edit it. I used code like the following.

RAW-TRANSFER BUFFER PB_{4} TO FIELD v_OldRawMd5.

v_OldRawMd5 = MD5-DIGEST(v_OldRawMd5).

RETURN v_OldRawMd5.

Turns out that this added a little over 50 ms for the hash, and 50 ms because did the work in a separate, private method.

One of the biggest shockers was the overhead for organizing code into separate methods. Calling the following from my (3000 record) loop seems to add about 50 ms, with nothing happening in it at all:

METHOD PRIVATE VOID GetExtraDataForRecordCreation (

BUFFER PBgen_loc FOR gen_loc,

INPUT-OUTPUT p_CoMingStatus AS CHARACTER ,

INPUT-OUTPUT p_MasterLocation AS CHARACTER,

INPUT-OUTPUT p_LocationStatus AS CHARACTER,

INPUT-OUTPUT p_Percent AS DECIMAL,

INPUT-OUTPUT p_ULCost AS DECIMAL,

INPUT-OUTPUT p_ULUnit AS CHARACTER):

/* nothing */

END METHOD.

I did all my tests with compiled code, and all 4GLTRACE logging turned off . I did use the abbreviated AS trace logging to capture the duration of my state-free appserver requests (retrieving 3000 records).

I don't know if my math adds up properly but I hope you see the general idea. The loop thru the 3000 database records used 3 private method calls in each iteration, 60 assign statements, and captured an MD5 hash, it also passed around the output dataset as an INPUT-OUTPUT parameter to the private methods BY-REFERENCE.

The code was really well organized and easy to read in the form that takes 450-500 ms. It becomes worse and worse as we approach 100 ms. It seems that ABL requires us to make some pretty ugly compromises for the sake of performance. I wish the overhead of calling private methods wasn't so large. Or maybe Progress should consider enhancing its compiler to automatically inline the code of a private method, if conditions in the caller will allow.

Anyway, I probably spent too much time at this, and I imagine many others have done similar tests. Is there any recent documentation with "dos" and "don'ts" for performance in ABL? I'd especially like to know about when private methods should be avoided and what the exact reasons are for the overhead (ie. is it certain types of parameters, by data type, INPUT/OUTPUT type, do BY-REFERENCE datasets have performance overhead, etc).

OpenEdge Development - Forum

Posted by bronco on 17-Mar-2017 02:30

Well it is a bit difficult to say something about the case without the code or the parameters under which the AppServer operates.

Having said that, I've seen multiple people trying to make suggestions concerning ABL performance and most of the time they get a "much more performance is lost/gained when accessing the database" reply. Personally I think downplaying ABL performance issues this way is a bit silly, because performance is (or should) always an issue and the ABL compiler isn't exactly on par in its optimizing when compared to the likes of C# or Java etc.

On the other hand, calling a method 3000 times or 1 time for the whole set is a choice which is costing you performance in the former case. No surprises there I would say.

Inlining isn't trivial either btw, because private methods can be called recursively just as well (and a number of other reasons why you would need the stack).

just my 2c

All Replies

Posted by bronco on 17-Mar-2017 02:30

Well it is a bit difficult to say something about the case without the code or the parameters under which the AppServer operates.

On the other hand, calling a method 3000 times or 1 time for the whole set is a choice which is costing you performance in the former case. No surprises there I would say.

Inlining isn't trivial either btw, because private methods can be called recursively just as well (and a number of other reasons why you would need the stack).

just my 2c

Posted by marian.edu on 17-Mar-2017 04:26

Did had a session at last year EMEA PUG Challenge discussing about that kind of overhead especially in a (over) layered framework (OERA it’s just a reference though). The method (function) call overhead is always there regardless of parameters - the only serious additional impact is when you pass data structures by values (deep copy), in fact that is visible even for properties compared with variables due to the getter method.

Most probably related to this method call overhead one can see similar overhead when using a deeper inheritance chain - super’s being called. Inlining could be an option, as well as compiler code optimisation (group assigns for instance) but as you said most will just tell you this is nothing compared to the time spent on data access and it’s kinda hard to argue on that.

So guess we just need to find a balance between reusability/maintainability and performance, after all we do expect to have performance overhead when using a 4GL… otherwise we would have all written good old c code and have fun shifting memory content around :)

Marian Edu

Acorn IT

www.acorn-it.com

www.akera.io

+40 740 036 212

Posted by marian.edu on 17-Mar-2017 04:42

Actually, having some fun now working with 4GL grammar just realised some of those ‘optimisations’ can be done by a ‘transpiler’ like what we have for typescript… the result might be uglier but more optimised while you still keep the code nicely organised ;)

Need to think about this transpiler option for Zamolxis… just have to have a full syntax model to start from, define a set of optimisations and have the ’transpiler’ added as an extra step in the project builder.

Marian Edu

Acorn IT

www.acorn-it.com

www.akera.io

+40 740 036 212

Posted by dbeavon on 17-Mar-2017 08:17

What I was trying to point to in my original post was that I have a 300 ms overhead that is *not* database related. The 100ms variation of the ABL code and the 400ms variation of the ABL will *both* get the same data into memory buffers in the same FOR EACH loop. The ABL language/organizational choices that are made *within* that (3000-record) loop is what adds 300 ms to the entire duration of the round-trip. (ie. about 1 ms per ten records).

I realize that this is a bit abstract without seeing the code itself.

I was hoping someone would corroborate my theory that private method invocations may be largely responsible for the overhead. I am hearing, at the least, that private method invocations are not free in ABL. Coming from a .Net background, it is a shock to see this contribute a substantial part of the 300 ms overhead. Breaking business logic into private methods seems like a code maintainability topic more than anything else. It would be like telling me that I can only add three lines of comments to each iteration of my FOR EACH loop and not more.

I understand that classic ABL procedure invocations are quite a bit more "dynamic" than they are in other languages. However, my hope was that OO methods were more strict and that the invocations of them, being better checked by the OO ABL compiler, would have reduced overhead at runtime.

I think there is much to be gained if Progress would start addressing performance issues in the ABL language as well as in the database. 300 ms is a lot of overhead for fetching such a small number of records.

Otherwise I may need to go back to using compile-time include files for my inlining in some cases (scary as the thought may be).

If anyone can point me to guidelines ('dos' and 'don'ts') to help my organize ABL for better performance, I would appreciate it. Some of these rules are certainly *not* self-evident (eg. the ASSIGN rule, the need to use moderation with private methods.)

Posted by Brian K. Maher on 17-Mar-2017 08:22

David,

Consider using BUFFER-COPY instead of ASSIGN or multiple assignments if possible.

Also, remember that the ABL is not an optimizing compiler so anything that happens in a loop will be evaluated each time. For example...

do i = 1 to num-entries(some_big_list):

end.

better would be:

define variable iEntries as integer no-undo.

assign iEntries = num-entries(some_big_list).

do i = 1 to iEntries:

end.

Brian

Posted by Brian K. Maher on 17-Mar-2017 08:44

David,

You may also want to look at your -Bt <n> and -tmpbsize <n> parameter.

Brian

Posted by AdrianJones on 17-Mar-2017 09:12

do you have many non "no-undo" variables, tables, data-sets in your code? if so it might explain the assign speed issue. every time you assign any non "no-undo" variable, all non "no-undo" variables are written out to the LBI (local before image) file. if you consolidate multiple assigns into a single assign this lbi write only happens once.

Posted by Patrick Tingen on 17-Mar-2017 09:27

One of my hobbies is to do strange things in the 4GL like writing games (don't ask). In a game every msec counts so I come across a lot of things that cost time. Recently I found out that referencing widget properties in a FIND statement also causes some slowdown. In my animation, I was able to reduce a single animation step from around 30 msec to 2 just by assigning the property value to an integer and doing the FIND with that instead of the property itself.

If you compare the performance of the progress compiler with a .Net compiler, you will be embarrassed by the difference. Take this a little program to do some triangle math (Pythagorean theorem). Just calculate the length of the hypotenuse in a double loop:

DEFINE VARIABLE a AS INTEGER NO-UNDO.
DEFINE VARIABLE b AS INTEGER NO-UNDO.
DEFINE VARIABLE c AS INTEGER NO-UNDO.

ETIME(YES).
DO a = 1 TO 1000: 
  DO b = 1 TO 1000: 
    c = SQRT(a * a + b * b). 
  END. 
END. 

MESSAGE ETIME
  VIEW-AS ALERT-BOX INFO BUTTONS OK.

This will cost around 700 msec in the 4GL. Do something similar in .Net and you will see execution times of around 7 msec....

Posted by Jeff Ledbetter on 17-Mar-2017 09:32

I enjoyed following your Advent of Code entries..

This thread is closed