R-Code questions

Posted by Thomas Mercer-Hursh on 27-Nov-2009 14:40

I am having a discussion with a fellow who has no background in ABL and he has raised some interesting questions and I am wondering whether one of the PSC folks who knows about internal can shed some light.

One of the questions arose because he felt that if I added a property to an object, say one that I used for sending data from a source to any one of 5 consumers, that all of the consumers would have to be recompiled, even if they did not access the new property, whereas I felt that only the consumer accessing the new property would need to be compiled, which I gather is true.  He felt they would need to be recompiled because each property would be turned into a memory offset relative to the base of the object during compilation, so the consuming object would have to be recompiled to pick up the current memory offset.  So, the question is, how is this happening in the R-Code?  Is there some kind of table of property names?  How efficient is this?

Our other topic has to do with temp-table parameters, both across the wire and within a sesion when passed by value.  Across the wire, he was struck by Greg Higgins experiments some years back in which he found that he could serialize a TT to XML, pass the XML, and deserialize to a TT as fast or faster than he could send the TT as a parameter.  It is quite possible that this has changed, perhaps drastically, with the ability to not send schema since I could imagine that full schema including FORMAT, INITIAL, LABELs, etc. could be quite extensive, particularly if the table didn't have very many records.  To me, this was more a testimony to the speed of XML processing in the ABL than it was a damnation of the speed with which the TT was sent across the wire.  He was certain that there would be other ways to send the data that would be far faster, but I have been contending that either the TT parameter or XML are using highly optimized code in the AVM, wheras any other form of packing and unpacking that I might try to write would necessarily be much slower, even if the message itself was more compact.

So, questions.

Are there some examples for the difference in size of the actual data packet with the modern options for sending a TT parameter?

Can anyone briefly describe how it is that the TT is serialized for transmission?  I.e., how much overhead is there?

For passing a TT by value within a process, is there any of the same kind of serialization and deserialization or is it a memory to memory copy?

All Replies

Posted by Alex Herbstritt on 01-Dec-2009 14:54

I will reply to the first of your questions regarding updating r-code for classes.
You are more correct than your friend. Adding a member to a class does not require a recompile for clients (i.e. other procedures and classes that use the class) that do not want to use the new functionality. Also, removing members form a class does not require a recompile for clients that are not using the removed member. If you remove a member that a client does use, but do not recompile the client, then they will get a runtime error that the member cannot be found when they try to access it.
How does this work? Magic.
When we compile the classes we give them MD5 digest values based upon key properties of the members that make up the class. Changing one of these properties and then recompiling will change the digest. (E.g. changing a member from PUBLIC to PRIVATE would cause the digests to change.)
At runtime, when these digests match, we do a direct dispatch to the member using a dispatch table – what your friend was describing. This is fast and efficient.
At runtime, when the digests do not match, we cannot trust the dispatch table and ignore it. We then do, what amounts to, a name lookup for the member – but, we are smart about it. Since we know that we cannot trust the dispatch table, we make a secondary (double indirection) table. This means that the lookup only happens once for each member of the class; following dispatches are just as fast and efficient. It is also at the time of making the new table that we note that members are missing so you get the nice error instead of bad stuff happening.
So, if you change a class which causes the digest to change – move things around, change types, change access mode, add members or parameters, etc. and don’t recompile your clients they will still work, just a little slower and you may get runtime errors for missing members.

Posted by Evan Bleicher on 01-Dec-2009 15:04

For your questions concerning Temp-table serialization, I am posting a response from Mary.

Are there some examples for the difference in size of the actual data packet with the modern options for sending a TT parameter?

ANSWER:

The format of the Temp-table parameter schema transmission changed with version 10, so that a monolithic schema packet is no longer sent.  This may be slower, but it allows any size table schema to be transmitted.  This difference between versions 9 and 10 can be eliminated for static receiving tables with the NO-SCHEMA option.  In that case, we send almost no  schema.

For static tables, the called procedure has to first read the schema from the rcode and instantiate the table (only once per .p/cls, not per invocation of an internal proc or method).

If the called procedure's table is dynamic, then the schema must be received from the wire and used to instantiate the table.  Here also, some of the schema label type character data may be eliminated with MIN-SCHEMA.

The data rows of the table are sent just like version 9.

Can anyone briefly describe how it is that the TT is serialized for transmission?  I.e., how much overhead is there?

ANSWER

The schema, if any, is sent piece by piece followed by the rows of the table.  The data itself is not converted in any way, but goes in the usual portable internal format, just like it appeared in the row in the temp-table database, only broken out into distinct pieces.  On the receiving end, each receiving row is created and each piece of data is put in its place in an optimized way, needing no conversion.  The only thing that would slow this down is differences in code-pages for char fields.

For passing a TT by value within a process, is there any of the same kind of serialization and deserialization or is it a memory to memory copy?

ANSWER

Within a process, it is much faster.  The receiving table row is "created" by using the sending table row as a "template".  There are no code-page issues.  It is never a global memory to memory copy because indexing has to go on for each row, and the new table has to actually exist in the temp-table database, alongside the original table.  The receiving table may have to be emptied, but that is very fast nowadays.  When the called procedure starts up, the receiving table definition, if any, has to be read from the rcode, to create the schema (once per persistent proc/cls instance), or if it does not exist (dynamic), has to be read (cloned) from the caller's table.

Whether it is cheaper to read XML or a temp-table parameter depends on how much data there is, how much schema there is and whether the callee table is dynamic.  The schema can often dwarf the actual data, and of course, vice versa.  If the schema is known on the receiving end, then NO-SCHEMA will help a lot, although there will be no error-checking that schemas match.

XML will be much bulkier if there is a lot of data, and conversions will probably be necessary.  The temp-table remote parameter will need almost no conversions, and there are no "tags" to identify anything -- just the raw data.  But a very large schema and only 1 or 2 rows of data could swing the balance so that XML would be cheaper -- for example with a static table that gets instantiated from schema in the rcode and checked against schema on the wire (assuming no NO-SCHEMA), and finally gets a couple of rows.  Some of the overhead of instantiating the callee table can be saved if the callee procedure is a method/internal-proc or function.

Posted by Thomas Mercer-Hursh on 01-Dec-2009 15:46

Thank you, Alex, Evan, and Mary!

Posted by Thomas Mercer-Hursh on 03-Dec-2009 13:14

Can you say a few words more about the dispatch table?

My friend is of the mind that the only efficient thing is a jump table based on position, which, of course, requires recompilation of the consumer if a new property is inserted in the middle of existing properties.

My guess is that the table works more like the index for tables or some kind of hash based on property name or id so that it is not position dependent.

Posted by Alex Herbstritt on 04-Dec-2009 07:07

Actually, it pretty much is a positional jump table -- this time your friend is correct.

Each reference in a client stores the value from this table. This allows quick dispatching to the member of the class. However, we additionally store the member name, type etc. with a client reference so that we can still use the client if it is "out of sync" with the class ...

As I described above, when the digest of the client fails to match the digest of the class the client builds a secondary (double indirection) table. This table maps the number from the client to the new number from the class -- or puts in and invalid marker if it is missing. This allows the client to still quickly dispatch to the members, even if they have moved (or error if they been deleted). The mechanism for creating this secondary table is a equivalent to a name lookup, so does cause a slow down -- but only once per client.

Posted by Thomas Mercer-Hursh on 04-Dec-2009 11:22

So, there is a positional jump table created at the time of compile, but if the client turns out to be out of sync, then a new "empirical" jump table is created on first use?

This seems to imply that the client interacts with the compiled class which it accesses at the time of its own compile in order to obtain the then current digest?

What if that other class hasn't been written yet?

Posted by Thomas Mercer-Hursh on 20-Dec-2009 12:26

Also, can you clarify the role of the indirection table.  E.g., suppose one has two classes A and B which both call class C.  Class C has undergone some changes and both A and B are out of date, but in different ways.  If A made the first call, found the jump table out of date, and built an indirection table, that table would not work for B.  So, is the indirection table by client?

Posted by Evan Bleicher on 21-Dec-2009 10:04

Yes, the indirection table is per client.

Posted by Thomas Mercer-Hursh on 21-Dec-2009 11:15

Thanks ... it seemed like it had to be.

This thread is closed