Collections and Iterators

Posted by Thomas Mercer-Hursh on 29-Jun-2010 11:57

In the wake of a discussion on the PEG, I got myself talked into separating the Iterator from the Collection so that one could have multiple iterators per collection.  At the time, I was thinking in terms of "Collection" as that covered by the Java Collection and Map hierarchies, which covers a lot of different kinds of functionality.  Having now refocused for the initial implementation on those collections needed to support one to many object relationships and nothing else, this being core to all OO programming, it is less clear to me that having more than one Iterator on a Collection makes much sense.  It may be such a rare and specialized requirement that it is worth creating a special collection for the need.  But, for the moment, at least, let's think about keeping the two separate.

The question is, how are we going to implement this?  How does the Iterator, which by all rights is a full-fledged peer object of the collection, efficiently navigate the collection?  If the collection is implemented with a temp-table, then we could provide the Iterator with a handle and define a query, but that makes my OO sense wince in having one object poking so intimatlely at the innerds of another.  And, it does us no good for any collection which is not implemented with a temp-table since there is no corresponding mechanism for arrays and work-tables.

As Bruce has pointed out, this is a place where package/library protection or friend classes or, better yet, friend members would seem to find a use case ... although again with some OO wincing ... but at this point that is only wistful thinking.

One could make the Iterator a subclass of the Collection and give the needed members Protected status, but not only is that OO nonsense (no Is A relationship between the two), but it only provides for one Iterator per Collection.

And, there are a variety of patterns in which we could embed the collection within an Iterator, but that still doesn't solve the problem of needing methods to do the navigation.  And, if there are public methods for navigation on the Collection itself, then what function is the Iterator performing.

The best idea so far ... which still has this limitation ... is to provide First, Next, Last, and Prev operators on the Collection which return an identifier and a Get operator which takes that identifier and returns the appropriate object.  The identifier would vary according to the implemention of the collection and would have no intrinsic meaning to anything outside the collection.  E.g., for an array implmentation it would be the index of the appropriate record.  For a work-table implementation in ID sort, it might be the object ID.  For a temp-table implementation it might be the RECID.

Note that Next and Prev on the Collection take an identifier parameter and the Next and Prev are relative to that indicated position.  E.g., with an array implementation one would call First and get back 1, Get(1), Next(1) returns 2, Get(2), Get(2) returns 3, etc.

One could certainly use this without the Iterator, but the Iterator would provide a little easier to use facade since First, Next, Prev, and Last on the Iterator would return the actual object and the Iterator would be responsible for keeping track of what was currrent.

Thoughts?

All Replies

Posted by guilmori on 29-Jun-2010 13:29

Why not simply have your collection class provide an indexer exposed as a Get(int element) method, which returns the Nth element. Then, with the help of the Size() method, the iterator can iterate on all elements. The collection must expose a way to retrieve each elements, but without keeping a "current" state, which is the job of the iterator.

A interesting article on Iterators in C#.

http://www.theserverside.net/tt/articles/showarticle.tss?id=IteratorsWithC2

After reading this, I really feel like using obsolete technology with our "Advanced" language... It is a complete non-sense that we have to do such low-level implementation without any help of the compiler.

With a yield return statement, the collection class could loop directly on its internal structure(tt, wt, array,...) yielding result back.

Posted by guilmori on 29-Jun-2010 14:09

Another way to make an association between the Enumerator and the Enumerable is to create a nested class within the Enumerable class, as show in this link:

http://ondotnet.com/pub/a/dotnet/2004/06/07/liberty.html

This way, the Enumerator instance have access to Enumerable private members.

But again, this is not possible in ABL, duh.

Posted by Thomas Mercer-Hursh on 29-Jun-2010 16:34

Get Nth Element is a feature of List* in Java terms.  One doesn't really need that for relationship collections, although it is certainly handy in other uses.  It is an interface which would work will with arrays, but which would lead to a lot of overhead in a work-table or temp-table implementation.  In some ways, this is equivalent to what I have proposed with the identifier, in fact, for an array, it would be exactly the same.  But, for a WT or TT it would be some other number which could be easily found (I am assuming that FIND WT WHERE ID = n on a WT is faster than a for each loop inrementing to check a count).

So, bottom line, I don't see how this would be any better than my proposal ... which I don't like because the collection is exposing more or less the same methods as the iterator.

* Note that one of the features of List is that one can add to the collection by position too, so it is quite legal to have a collection with positions 1, 5, 9, and 32 filled and everything else null.  Not something one needs for a relationship collection.

Posted by Thomas Mercer-Hursh on 29-Jun-2010 16:39

Yes, Burce mentioned this, but as you say, not an option in ABL, so I forgot to mention it.

Posted by guilmori on 29-Jun-2010 17:35

tamhas wrote:

Get Nth Element is a feature of List* in Java terms.  One doesn't really need that for relationship collections, although it is certainly handy in other uses.  It is an interface which would work will with arrays, but which would lead to a lot of overhead in a work-table or temp-table implementation.  In some ways, this is equivalent to what I have proposed with the identifier, in fact, for an array, it would be exactly the same.  But, for a WT or TT it would be some other number which could be easily found (I am assuming that FIND WT WHERE ID = n on a WT is faster than a for each loop inrementing to check a count).

It is a fact that an Iterator must have some knowledge of the internals of the collection over which it iterates, but current ABL provides no help for this(no internal/friend modifier, no nested class and no yield return), so I don't think you have any other choice than to have the collection expose a public interface for the iterator to access the elements. My suggestion is to keep this interface as small as possible, ie:  a single Get(int elementPosition) method.
In your proposal, you suggest a First, Next, Last, Prev and Get methods dealing with an identifier... I don't get why all these would be necessary. Just having a method to retreive an element by its position (not its identifier) seems sufficient to me. And what is the problem of having an extra indexed field in the tt or wt that contains the position of the element ?

So, bottom line, I don't see how this would be any better than my proposal ... which I don't like because the collection is exposing more or less the same methods as the iterator.

Not at all, the Iterator expose methods that imply a current position. The Collection would expose a single method to get an element by its absolute position.

But I do agree that this is far from perfect solution, we are trying to re-invent some pretty old stuff with some poor tools.

Posted by Thomas Mercer-Hursh on 30-Jun-2010 11:56

No index on a work-table, of course.  One *could* include a column for N and do a find on it, but remember that only one of the types of collections is organized in addition order ... and that is the one likely to be implemented with an array.  For ID order, one is going to do a FIND last where less than in work-table to position the cursor for and insert or just insert to a temp-table and let the index create the order.  That would mean having to read and write every record after every insert in order to reset the correct N after the insertion.  That would only be efficient if one added all the records and then announced they were all there and did the scan at the end.

Posted by guilmori on 30-Jun-2010 13:23

The iterator could iterate over its set through a "IAccessibleByIterator" interface. The set would explicitly implement this interface, which means the consumer would not see this interface members when working with a set, it would be visibile only when working through an instance of the interface "IAccessibleByIterator", which would be done only by the iterator.

See http://msdn.microsoft.com/en-us/library/aa288461%28VS.71%29.aspx for an explanation of explicit interface implementation.

The question now is, does the ABL support this ? I can say not in 10.1C.

Posted by Admin on 30-Jun-2010 13:39

The question now is, does the ABL support this ? I can say not in 10.1C.

It isn't supported in 10.2B either, except with .NET types on GUI for .NET

Posted by guilmori on 01-Jul-2010 07:31

Isn't it mentioned somewhere that the work-table implementation would use an array containing objects in addition order ? Why not use this for the GetNth() method ?

And I do not see any Insert() method mentionned... only Add(), which adds to the end right ? Then what re-numbering are you talking about ?

I agree Remove() would incur some re-number overhead, but If you need to remove many elements, just use the RemoveAll() method and re-number once.

Posted by Thomas Mercer-Hursh on 01-Jul-2010 10:40

GetNth, if intended to support the iterator, has to be Nth in the sort order of the work table.  The array is in addition order.

Add() is Add to the collection.  The sort order is defined by the collection type.  If aSet, then the order is the addition order.  If aIDSet, then the order is the order of the object IDs, regardless of the addition order.  If aAttrKeySet or aAttrSortSet then the order is the order of the key, regardless of order of additon.  Thus, no Insert() is required.  That kind of functionality fits more with List, which we are not doing in this round, where one adds to the collection at a designated position.

It isn't just overhead.  It is also the logical issue of what it all means.  I.e., if one is processing through and is on the 5th one and someone removes the 1st one, then where are you?  What's next?

Posted by Thomas Mercer-Hursh on 02-Jul-2010 17:07

It has occured to me that it is possible to provide multiple interators on a collection without having separate iterators.  One merely needs to provide multiple cursors within the collection.  Then, Next(1) because Next on the first cursor.

This makes me think that the idea of separating collections and iterators as separate areas of responsibility may be a poor decomposition of the problem space and really they are parts of the same responsibility.

Posted by cwills on 23-Dec-2010 18:55

Yeah I like this approach, this is how I would probably implement a solution in C#...

In the absence nested class support in ABL though, we simply opted to have the enumerator class know a little about the internal implemention of the given collection. And have a specific enumerator class for every collection:

E.g.

     Collection.cls     --->     CollectionEnumerator.cls

     List.cls           --->     ListEnumerator.cls

     ...etc. with each enumerator implementing the IEnumerator interface

I've attached some of our implementation...

Posted by Thomas Mercer-Hursh on 30-Dec-2010 14:12

I hope you are open to some feedback.

As you can guess, I don't like the idea of instantiating the iterator with a query created by the collection, although you are in good company since CloudPoint does this for instantiating a collection based on the Model.  Why are you creating the buffer in the collection?

You are using a temp-table to store the objects for a collection.  As you know, I did the same thing in my old version, but I am now questioning this for simple collections as being too heavyweight.  However, your TT has no index and nothing about it to establish order.  I believe, although I don't know whether it is guaranteed, that if you add all of the objects at the start and then access the objects in what will be RECID order that they are likely to be in addition order, but there is nothing to ensure this if you do any deletions.  Try, for example, adding 10 elements, deleting the second, and then adding another and then reading the whole collection.  I would think that if the last record fit in the space for the deleted record, which I would think that it would since all records are just an object reference, that the last added record would now appear in the second position.  This is not how one normally expects collections to work.

Your find obj = item.obj approach for deletions is going to run into scaling issues with large collections and lots of deletions.  So will a number of implementations, which is why one has multiple collection types with different characteristics.  Of course, with a TT, you could always have an index on the identity to get this to perform well.

You have violated normalization by having the same methods on the collection as you have on the iterator.

One can argue that moving to the next object in the collection and getting that object should not be two method calls since that means double the method calls to simply iterate through the collection.  The alternative is to use getNext which returns the object and to use error handling for going off end.

This thread is closed