Performance: what's a lot of objects?

Posted by Peter Judge on 12-Mar-2010 12:12

The other OO performance thread got me thinking about what you would consider a lot of objects to have in the session object pool for an application? I realise this is a real "it depends" question, but thought I'd try anyway. I think I'm primarily interested in high-water marks of active objects, although average loads are also useful. Garbage collection is assumed to be enabled.

In other matters, is the below the fastest way to traverse the session's object pool? Assume that I can't store references elsewhere (which is my preferred way).

        oInstance = session:first-object.

        do while valid-object(oInstance) and  not valid-object(oReference):

            if piReference = int(oInstance) then

                oReference = oInstance.

            oInstance = oInstance:Next-Sibling.

        end.

The connection between the two is left as an exercise for the reader

-- peter

All Replies

Posted by Thomas Mercer-Hursh on 12-Mar-2010 12:39

I think you probably need to qualify your question to limit it to a highwater mark which is a legitimate requirement.

With that qualification, I think we already have an experience base to tell us the answer fairly closely and that is out experience with -L .   It may not be one for one, but there should be a strong relationship between the maximum number of records which need to be locked within a single legitimate transaction and the number of objects which need to be in that transaction.  One might have more locks than objects if there are multiple tables used to compose a single object, e.g., a main record and a subtype record.   And, if we are holding things in collections, we might need some number of additional objects which did not relate to the DB directly as the collection objects.

I personally have seen legitimate cases for locking 20-30000 records in a single transaction, but that was pretty unusual -- one check paying off 7000 invoices.  One can also get numbers like that in a GL year end close,  if there are a large number of accounts involved.

I suspect that this is a question where one should distinguish between typical highwater marks and special case highwater marks.  E.g., one might have an application that would run all year with a couple of thousand max and then have one function at the end of the year which went to 20,000.

I also think that there is a significant element in here of what one decides are the business rules.  E.g., historically, we ABLers have typically relied on using database transactions to enclose units of work which we understood should either commit as a whole or not at all.  With a transition to distributed SOA systems, we have to rethink this because that logical transaction can be spread across the network, so we end up forced into committing work in separate pieces, relying on the ESB to commit the rest of the most "almost all of the time" and having some kind of reversing transaction to handle the rare exceptions.  Distributed environments force that on one, but that doesn't mean that one can't adopt the same philosophy in a single session in order to avoid extremes.  E.g., the check paying example could have been engineered to commit in blocks of 100, for example and then provided with a restart and or back out mechanism in the rare case that the whole did not commit as expected.

Posted by Tim Kuehn on 12-Mar-2010 12:53

pjudge wrote:

The other OO performance thread got me thinking about what you would consider a lot of objects to have in the session object pool for an application? I realise this is a real "it depends" question, but thought I'd try anyway. I think I'm primarily interested in high-water marks of active objects, although average loads are also useful. Garbage collection is assumed to be enabled.

In other matters, is the below the fastest way to traverse the session's object pool? Assume that I can't store references elsewhere (which is my preferred way).

        oInstance = session:first-object.
        do while valid-object(oInstance) and  not valid-object(oReference):
            if piReference = int(oInstance) then
                oReference = oInstance.
            oInstance = oInstance:Next-Sibling.
        end.



The connection between the two is left as an exercise for the reader



-- peter


You're missing a "leave" statement after the "IF". Once you've found what you're looking for, there's no need to look at any other objects.

So no, this isn't the fastest way to find an object in the object pool.

Tim

Posted by Peter Judge on 12-Mar-2010 12:53

With that qualification, I think we already have an experience base to tell us the answer fairly closely and that is out experience with -L .   It may not be one for one, but there should be a strong relationship between the maximum number of records which need to be locked within a single legitimate transaction and the number of objects which need to be in that transaction. 

One object per record for a transaction is probably worst-case; approaches which use a ProDataSet will have (significantly?) fewer, right? I'm not looking to revisit the discussion that's raging going on the other thread; what I'm trying to establish is whether the 325ms it takes me to read through 100k objects in the buffer pool is (a) necessary and (b) acceptable. Necessary in that I could pick a million or a thousand, but I'd like to be somewhat accurate.

-- peter

Posted by Peter Judge on 12-Mar-2010 12:54

timk519 wrote:

        oInstance = session:first-object.
        do while valid-object(oInstance) and  not valid-object(oReference):
            if piReference = int(oInstance) then
                oReference = oInstance.
            oInstance = oInstance:Next-Sibling.
        end.






You're missing a "leave" statement after the "IF". Once you've found what you're looking for, there's no need to look at any other objects.


No, I'm not. I have a "not valid-object()" which does that for me.

-- peter

Posted by Tim Kuehn on 12-Mar-2010 12:55

and if "oreference" is valid when this is run?         

Posted by Peter Judge on 12-Mar-2010 12:56

timk519 wrote:

and if "oreference" is valid when this is run?         

It won't be; the point of the API is to find oReference. But that's not clear at all from the code I posted, my apologies.

-- peter

Posted by Tim Kuehn on 12-Mar-2010 13:00

pjudge wrote:

timk519 wrote:

and if "oreference" is valid when this is run?         

It won't be; the point of the API is to find oReference. But that's not clear at all from the code I posted, my apologies.

-- peter

Ok. I didn't assume oReference was invalid on call because I've seen code like this in all kinds of includes that walk the procedure-handle tree.

Posted by Tim Kuehn on 12-Mar-2010 13:00

Another question - why the check on int(oInstance)? Can the "int" be gotten rid of and a straight handle to handle comparison be done?

Posted by guilmori on 12-Mar-2010 13:02

tamhas wrote:

One might have more locks than objects if there are multiple tables used to compose a single object, e.g., a main record and a subtype record.   And, if we are holding things in collections, we might need some number of additional objects which did not relate to the DB directly as the collection objects.

One might also have much more objects than records. An example would be using patterns like Wrapper, Decorator or State.

Posted by Peter Judge on 12-Mar-2010 13:17

Another question - why the check on int(oInstance)? Can the "int" be

gotten rid of and a straight handle to handle comparison be done?

It certainly could, generally-speaking. But in my case I store a weak reference to an object as an integer. The garbage collector is then able to clean up the object if no-one is holding a reference to it, and I can check whether it's alive. If I hold a strong reference it will definitely be alive, unless someone explicitly deleted it.

-- peter

Posted by Tim Kuehn on 12-Mar-2010 13:32

pjudge wrote:

Another question - why the check on int(oInstance)? Can the "int" be

gotten rid of and a straight handle to handle comparison be done?

It certainly could, generally-speaking. But in my case I store a weak reference to an object as an integer. The garbage collector is then able to clean up the object if no-one is holding a reference to it, and I can check whether it's alive. If I hold a strong reference it will definitely be alive, unless someone explicitly deleted it.

-- peter

and so is born yet another one of those progress idiosyncracies....

Posted by Thomas Mercer-Hursh on 12-Mar-2010 13:33

Why would using a PDS change anything?  Unless, of course, you use the original John Sadd form of a BE which takes a PDS as a parameter and doesn't ever create any actual individual BE objects.  With M-S-E, the number of objects will be very similar to the number needed by a PABLO implementation.

I would think it *highly* unusal to need 100K objects in a single AVM.  That would send off the same kind of alarms as a -L of 100000.

Why do you care how long it takes to iterate the objects in this way?  The whole idea of OO is direct navigation through relationship, not no-index searches of everything there is?

Posted by Peter Judge on 12-Mar-2010 13:37

and so is born yet another one of those progress idiosyncracies....

>

Holding a weak reference? Not really ... http://en.wikipedia.org/wiki/Weak_reference

-- peter

Posted by Thomas Mercer-Hursh on 12-Mar-2010 13:45

True, heavy use of those patterns could increase the object count, but I guess I don't think it likely that I would use them in a broad brush way that would be multiplying the number in the pool overall.

I suppose I am making some assumptions that I should make more explicit.  E.g., in legacy applications, it is fairly common for people to have mushed together a lot of information into a base record which, from an OO perspective, really represent multiple subtypes of that base.  Good OO design and good RDBMS design, come to that, would dictate moving that subtype information into separate records so that one would typically have both the base record and the subtype record involved in any one object.  If one used generalization to implement the subtypes one would have fewer objects than records.  If you used a wrapper approach for the subtype as in M-S-E, then you would have the same number of objects as records.  If you left the data mushed into the base record, then generalization would yield one to one and M-S-E would have more.

So, yes, one needs to consider these assumptions in equating one with the other, but I still think that transaction scope and the number of entities legitimately involved in a single transaction gives us the baseline for figuring out how many objects.  One needs to apply the appropriate rules,  multiplying or dividing or adding or subtracting according to the assumptions of the pattern and practice, but the baseline is still there.

Posted by Thomas Mercer-Hursh on 12-Mar-2010 13:49

So, it seems to be that the thread has forked.  One branch is the question in the title.  The other branch is wondering what in the world you are doing with this code?

Why would you ever just store any kind of reference to an object somewhere and then go looking through the entire object pool to find of if it is still around?  Put it in a collection or something so that you can just navigate straight to it.

Posted by Peter Judge on 12-Mar-2010 13:57

Why would you ever just store any kind of reference to an object

somewhere and then go looking through the entire object pool to find

of if it is still around? Put it in a collection or something so

that you can just navigate straight to it.

Then you hold a strong reference to it, and the GC cannot clean it up. And there's no other way in ABL (that I know of) to get back to an object reference.

But you're right about the divergence of this thread. I'm really far more interested in the original question (I just posted the code as an example of why I would want this info).

-- peter

Posted by Thomas Mercer-Hursh on 12-Mar-2010 14:28

OK, so we should focus on the title ... but I still question the appropriateness of a weak reference ...

Posted by Tim Kuehn on 12-Mar-2010 15:51

pjudge wrote:

and so is born yet another one of those progress idiosyncracies....

Holding a weak reference? Not really ... http://en.wikipedia.org/wiki/Weak_reference

While it may be well known to OO developers - I suspect ABL developers trying their hand at OO coding may not know this little factoid.

Anyway, my curiosity is satisfied - so back to the original question.

Posted by Peter Judge on 15-Mar-2010 09:19

While it may be well known to OO developers - I suspect ABL

developers trying their hand at OO coding may not know this little

factoid.

To be honest, I only discovered this recently myself. The best use I can see so far is for caching objects. Garbage collection will clean up objects once all references to them are gone, but if we hold a strong reference to an obect (eg. in a temp-table field defined as P.L.Object on ttCache), the no-longer-required object will not be GC'ed. If we hold a weak reference (ie stored as integer) then the GC will go on its merry way and clean up the object in the absence of any strong references.

Dr Mercer-Hursh might argue at this juncture that this is precisely why GC is not to be trusted; he might have a point, although the benefits of GC to me still far outweigh this.

-- peter

Posted by Thomas Mercer-Hursh on 15-Mar-2010 11:36

My question is more along the lines of what the weak reference is for.  I see a point in a cache, but if I have something in a cache it means I still want it and don't want it to go away.

E.g., I have been thinking recently about the problem of before image handling.  It seems burdensome to make a simple BE take care of its own BI, although, of course, it does provide a kind of undo feature.  Sourcing the data from a PDS and leaving the original data in the PDS is a possibility, as is leaving behind a memento object.  But, the problem with the latter two is how does one know that the BE has been deleted without persisting any changes so that one can clear out the saved data?  Do you see a weak reference solution here?

If not, then what?

p.s., I do have an idea of what I might want to do about the BI issue, but haven't played with it yet and it doesn't have anything to do with this thread.

Posted by Tim Kuehn on 15-Mar-2010 13:36

tamhas wrote:

My question is more along the lines of what the weak reference is for.  I see a point in a cache, but if I have something in a cache it means I still want it and don't want it to go away.

The weak reference is so that GC will knock off the object since it has no more references. If a strong reference was used, that would mean the object still had an outstanding "reference", and GC wouldn't get rid of it.

In terms of the original question - I'm supposing that the DO loop would be the fastest way to find an object unless one had some way of mapping the object creation / deletion process to an object reference tracking system, and then was able to do a FIND like operation to find the object in question.

Tim

Posted by Thomas Mercer-Hursh on 15-Mar-2010 13:50

Tim, I understand what happens with the weak reference.  My question is, what's the point?  If I have a weak reference to an object, before I can use the object I have to run around and find out if it still exists and, if it doesn't, then what?  I can't resurect it.

As for the fastest way, again I'm not sure of a reasonable context for searching all objects in memory.  I did it back in 10.1A as a way to implement a singleton, but we don't need to do that any more.  Much better, it seems to me, is to manage collections.  If I'm done, I'm done.  If I'm not, I'm not.  The only problem I have been able to find here is when A sends an object to B but keeps a clone or memento or some data and B might modify and return the object or it might just delete it without doing anything.  Then, one has a management problem  about the local copy .

I *suppose* if the memento included a weak reference to the original, then I could periodically check all the mementos in the cache to see if the things they point to still exist, but in practical terms that is unlikely to be of any help in my architecture since there will be a level of remove and I won't have the right identifier.

This thread is closed