OO / R-code load performance - OpenEdge General - Forum

All Replies

Posted by svi on 17-Sep-2008 13:00

Thank you Tim. We'll check it out.

Posted by Thomas Mercer-Hursh on 17-Sep-2008 13:04

What is the total number of objects and temp-tables you are instantiating? There was some stuff on the PEG a while back about systems with large numbers of temp-tables and performance issues ... one of the arguments in favor of using properties instead of one line temp-tables for entity objects.

Posted by Tim Kuehn on 17-Sep-2008 13:22

30 objects, with each object instantiating 20 super-classes.

The classes themselves are all of a form:

CLASS class-name

INHERITS class-name-02:

END CLASS.

The TT definitions are all within the class definition, and the number of TT definitions that were present depends on a &SCOPED-DEFINE in the code generator.

These programs were written to demonstrate an r-code load performance problem I'm experiencing with larger, non-OO system in a small, easy-to-deploy and run package. Consequently, any association between this system and "proper" OO design and implementation (or lack thereof) is purely accidental.

Posted by adaltas on 17-Sep-2008 13:25

moved this thread as requested to the general OpenEdge forum.

Posted by Tim Kuehn on 17-Sep-2008 13:28

Thanks!

Posted by Thomas Mercer-Hursh on 17-Sep-2008 13:36

30 objects, with each object instantiating 20 super-classes.

So, 630 objects total?

&SCOPED-DEFINE tt-definition-cnt 4

This means 4 TTs per object or 2520 TTs?

If so, I'm going to guess the issues is TTs, not .rs.

I would consider doing some systematic tests, e.g., 0 to 10 TTs per and look at the graph.

You might also try two other tests, one without TTs, but a range of .rs and the other a range of TTs, but in a single .r.

Posted by Tim Kuehn on 17-Sep-2008 13:44

30 objects, with each object instantiating 20
super-classes.
So, 630 objects total?

That depends on how PSC does things in the background. It's not an unreasonable guess.

&SCOPED-DEFINE tt-definition-cnt 4
This means 4 TTs per object or 2520 TTs?

I would presume so.

If so, I'm going to guess the issues is TTs, not
.rs.

It's quite possible - but in the code base I'm working with, the r-code files are upwards of 500K+ each, each with a multitude of TT definitions that may or may not be used. A single command could result in loading 10MB of r-code, and while I expect some delay, the kinds of delays we're experiencing is completely beyond what I'd consider reasonable for what's going on.

I would consider doing some systematic tests, e.g., 0
to 10 TTs per and look at the graph.

You might also try two other tests, one without TTs,
but a range of .rs and the other a range of TTs, but
in a single .r.

While this may have some theoretical interest, for the purposes of what I'm trying to demonstrate, I think what I've posted does a somewhat reasonable job. At the very least, it demonstrates that having a lot of TT defn's slows things down quite a bit, even if they're never used.

Posted by Thomas Mercer-Hursh on 17-Sep-2008 14:15

Which is why I would try some of the other tests ... if, for example, you can do one .p with a large number of TTs and see the same performance issues, perhaps with a "knee" where it really kicks in, then that would tell us a lot, i.e., it isn't the number of .rs or perhaps not even the size of the .rs, but more the number of TTs. How many TTs in your real world sample?

Posted by Tim Kuehn on 17-Sep-2008 14:20

How many TTs in your real world sample?

I honestly don't know, as it depends on where the user is in the system. I was figuring the TT's could serve as a proxy for large r-code.

Posted by Thomas Mercer-Hursh on 17-Sep-2008 14:37

Well, that is one theory, but I have the notion from that exchange on the PEG a while back that the TTs are their own issue. Remember the guy that was simulating OO with PPs or SPs and everything had a temp-table and it was really slow even though the "objects" themselves were not large? There was speculation at the time that TTs are accessed through some kind of lookup which was not designed in a way that it scaled well once one got into thousands of TTs ... like a non-indexed list or some such.

Note that it is quite possible that there are two or more unrelated issues here. I.e., there may be an issue with large numbers of TTs in one session which is separate from a possible issue with the speed with which R-code is loaded and the R-code issue could relate specifically to large programs or it might relate to large numbers of programs. If your real world example isn't likely to have thousands of TTs, then your test may be illuminating the wrong problem.

The session does have enough memory to load all that r-code, doesn't it? I.e., it isn't a question of trying to load more than will fit in memory and ending up swaping a bunch to disk. That would certainly ruin performance.

Posted by Tim Kuehn on 17-Sep-2008 15:03

That is entirely possible - since I don't have an easy way to simulate the 100's of PP/SP instances that can be loaded at once, much less their r-code size and structure, I did it with a few hundred small do-nothing objects with a TT in each.

I recall someone having a system which instantiated a PP for every UI element in the system, and had upwards of 10K PP running at one time. I don't think that would be the case with my test instances in that none of the TTs are being used, so any "hit" that happens is taken when the AVM does whatever it does when it encounters a new TT definition.

Personally, I think it'd be great if the AVM could postpone doing anything with the TT allocation / initiation until it's actually referenced / used. That would be a huge win right there.

Re memory allocation - I've tried a number of configurations from r-code libraries, shared mem libraries, r-code by itself, and bumping the mmax so none of the r-code was swapped out, etc. While shared mem libraries and a big mmax performed the best, the difference between them and straight r-code wasn't that much - it still took way too long to load things and get the user actually doing something.

Posted by Thomas Mercer-Hursh on 17-Sep-2008 15:31

So, it seems like we have several different issues here which it would be good to sort out. Your example, while certainly interesting, is testing a couple thousand TTs in combination with deep inheritance and a moderately number of objects. If testing without TTs is fast, then it would appear that neither the deep inheritance nor the number of objects was a problem in itself, at least when each instance was trivial. That seems to point a finger in the direction of the TTs, but to be more comparable to your situation we need some variations. E.g., one would also like to test creating hundreds of PPs and SPs and one would like something besides a TT to give each instance more heft. Of course, each source of heft might have its own impacts. E.g., one could define a series of character variables and give them each a very long initial value, but then that might end up testing the string handling initialization instead.

The deferred initialization is certainly an interesting sounding idea.

Posted by jmls on 25-Sep-2008 02:25

It may be inheritance or propath issues. If you look at this thread (http://www.oehive.org/node/1267) you will see that we can create over 10000 classes in memory in 750ms. Granted, they ae empty classes. However, if you look at the thread as a whole you will see that things like adding the .r into a prolib (or using -q) can make things run nearly twice as quick. Or using -Bt 2048 etc etc

Posted by Thomas Mercer-Hursh on 25-Sep-2008 10:52

I still suspect the temp-tables ...

Julian, might you be able to rerun those tests adding a nominal temp-table into the class. That would be an interesting data point.

Posted by Tim Kuehn on 25-Sep-2008 10:54

It may be inheritance or propath issues. If you look
at this thread (http://www.oehive.org/node/1267) you
will see that we can create over 10000 classes in
memory in 750ms. Granted, they ae empty classes.

Are they different classes, or multiple instances of the same class?

However, if you look at the thread as a whole you
will see that things like adding the .r into a prolib
(or using -q) can make things run nearly twice as
quick. Or using -Bt 2048 etc etc

With the production system this client has, some programs load upwards of 18MB or r-code, and even w/-q, prolib, and -Bt set big enough to hold the entire amount, it still takes on the orders of 10s of seconds to load.

Posted by Tim Kuehn on 25-Sep-2008 10:55

I still suspect the temp-tables ...

That seems to be the main culprit...

This thread is closed