Options for mapping between .Net DataTable and ABL temp-tabl

Posted by dbeavon on 01-May-2019 16:07

The "CLR bridge" can run within the memory space of the PASOE msagent (_mrpoapsv). Given the functionality of the "CLR bridge", and the co-habitation of .Net and ABL in the same process, I was hoping it would be fast and easy to convert back and forth between ADO.Net DataTable and ABL Temp-Table.

I don't personally have much experience with "GUI for .Net" and only started using the CLR bridge in the context of PASOE. I originally assumed that DataTable mappings would happen as part of GUI for .Net, and that they would be fast. But apparently another type of a strategy is used for binding to WinForms. ... see documentation.progress.com/.../index.html
.. from what I can gather, the strategy seems to be based on ProBindingSource, and doesn't involve moving any data from ABL into an ADO.Net DataTable.

I see that this is an old topic and has come up before.  I just wanted to ask whether there are any new approaches to consider using in OE 11.7.4 (on windows).  Here are some prior posts that I found, and one KB article:

Here is a list of the approaches that I've read about thus far:


1) Step through the temp-table manually (and probably in a non-generic way) and copy data on a row-by-row, field-by-field basis.

2) Somehow use ProBindingSource and .Net reflection to read or write ABL TT, given a .Net DataTable

3) Use Progress Write-Xml and the DataTable ReadXml methods.

All of these seem like they would perform poorly when scaled up to 100,000 rows or so, right?  I am planning on doing some of my own benchmarking but I'm not that optimistic.  In the context of PASOE, there may be additional considerations.  For example the approach #3 might even have an additional delay due to a somewhat artificial concurrency conflict (see https://community.progress.com/community_groups/openedge_general/f/26/t/57795 ).

Can anyone tell me if there are some faster approaches that would allow us to quickly map back and forth between DataTables and temp-tables?  I'm hoping for an approach that can be done (1) in a generic way rather than requiring work that is tailored to each table, (2) totally in-memory, rather than using files on disk, and (3) is very fast and involves minimal CPU for serialization/deserialization.  One thought that came to mind was to possibly piggy-back on the ABL IMPORT and EXPORT statements.  IMPORT may be the fastest possible way to deserialize raw data from an external source.  The only problem with this might be that an ADO.Net DataTable doesn't natively support the same format and may take some work to support it in a generic way.

Any ideas would be appreciated.  Sorry for digging up this old topic once again.  It would be nice if there was a built-in feature for quickly mapping data back and forth.  The CLR bridge has been available for quite some time, and DataTables and Temp-Tables are used very frequently for data storage (one in the CLR and the other in the AVM).  It seems like there should be a lightning-fast way to map data back and forth between them!

Posted by tbergman on 01-May-2019 22:00

The .Net demos I presented a couple of years ago at the PUG Challenge contains some examples of using SQL server and .Net datasets. The entire set of files and the presentation can be downloaded from pugchallenge.org/downloads2017.html.

I think the files you want to look at start with the letters "sql".

Posted by dbeavon on 09-Jun-2019 15:09

I was able to get my data moving back and forth between .Net DataTable and ABL temp-table using both XML and JSON.    The JSON approach is quite a bit faster compared to READ-XML (about 3x faster based on the scenarios I'm working with.)  Also it won't ever suffer from blocking bugs in the Xerxes XML parser.

I also started evaluating an approach that used the Progress "IMPORT" format.  The biggest drawbacks are that it is proprietary, and it requires the fields within a record to be exported and imported in a predetermined sequence.  Whereas with JSON and XML, the fields can be ordered on each side in a different way (or can even mismatch each other to a large degree).  This makes them quite a lot more appealing than the "IMPORT" format, even if they are slower (about 100 ms per 10,000 records for READ-JSON).

I think the EXPORT/IMPORT operations in ABL makes sense for data that will be serialized and deserialized to the *exact* same schema, and only in the context of ABL.  But if you are moving data across different technologies, and/or if your schema formats mismatch in any way then EXPORT/IMPORT is quite a poor choice.

All Replies

Posted by rblanchard@ospreyretail.com on 01-May-2019 17:24

Why can’t you just write-xml from prodataset and readxml into system.data.dataset?
 
 
hPds:WRITE-XMLSCHEMA ("FILE", cReceiptsXsd).
hPds:WRITE-XML ("FILE", cReceiptsXml).
           
// now read in our xml and xml schema          
oDataSet = NEW System.Data.DataSet ().
oDataSet:ReadXmlSchema (cReceiptsXsd).
oDataSet:ReadXml (cReceiptsXml).
 

Posted by tbergman on 01-May-2019 22:00

The .Net demos I presented a couple of years ago at the PUG Challenge contains some examples of using SQL server and .Net datasets. The entire set of files and the presentation can be downloaded from pugchallenge.org/downloads2017.html.

I think the files you want to look at start with the letters "sql".

Posted by tbergman on 01-May-2019 22:00

The .Net demos I presented a couple of years ago at the PUG Challenge contains some examples of using SQL server and .Net datasets. The entire set of files and the presentation can be downloaded from pugchallenge.org/downloads2017.html.

I think the files you want to look at start with the letters "sql".

Posted by dbeavon on 02-May-2019 00:07

@tbergman thanks for the pointer to your presentation!  There is lots of great stuff in there for those who are in a position to use the CLR bridge.  I found the demos that you referred to (SQLDemo1.p and SQLDemo2.p).

It occurred to me that you could have shown an example of connecting to an OE database via SQL92 (with an OdbcConnection), but you demonstrated a SqlConnection instead.  Is there some reason why you avoided a connection to the OE database?  That is what I've been focused on lately. I'm pretty eager to start making some large queries back to the Progress database using SQL92 via the CLR bridge.  Given that we are still running OE 11.7.4, I suspect I can use SQL92 queries to get a *substantial* performance improvement on certain queries - perhaps the improvement will be comparable to the 300% that is promised when we finally upgrade to OE 12 (wherein Progress has implemented multi-threading and server-joins in their remote database servers).

Making queries from .Net is easy.  And after making a large SQL92 query, the only real trick left is to quickly map the data from .Net back to an ABL temp-table.  I see in your SQLDemo1 and SQLDemo2, you used a "SqlDataReader" and "WriteXml/READ-XML", respectively.  I don't want to look a gift horse in the mouth, but I've found these approaches to be a bit slow for large datasets with lots of columns.  They both seem quite CPU-intensive.  The additional time for those approaches can take longer than the original query itself.  It would be better if there was built-in functionality for mapping the results back to ABL from .Net.

@rblanchard Your approach to WRITE-XML/ReadXml is fairly fast (from ABL to .Net).  But as I mentioned, coming back the other way via WriteXml/READ-XML (.Net to ABL) is quite slow. That is the one that we'd be using most in order to retrieve data from an OpenEdge DB.  It will happen after using an ODBC connection to the SQL92 engine and running a query via the ODBC driver (DRIVER=Progress OpenEdge 11.X Driver).  It will alleviate many of performance bottlenecks we have today when we try to execute some large "client-server" queries from regular ABL code.

Another problem with READ-XML (making it even more slow) is the artificial concurrency conflict that can impact PASOE if multiple sessions are trying to use that functionality at the same time (see  community.progress.com/.../57795  ).

In order to map .Net data back to ABL, the fastest approach I've found so far seems to be via the ABL "IMPORT" statement.  There is some legwork needed to create a generic custom .Net assembly that generates the Progress "IMPORT" file format (after running an ODBC query).  But the result the custom assembly would be a file on disk which ABL can then "IMPORT" quickly into an ABL temp-table.

There are only two major issues I have (so far) with the IMPORT approach.  First, I don't like the fact that I have to use the local disk for intermediate storage, since the .Net code and the ABL code are running in the *same* msagent process.  It seems pretty silly for the results to be sent out to disk and then pulled back into memory again.  And secondly, I don't particularly like the IMPORT format since it is not binary and doesn't seem standardized (any more than a csv). I'm not eager to see what happens when my ODBC results have my delimiters within the data itself (or it may even have line breaks).

I suppose JSON might be a middle-ground between the "IMPORT" format and the full XML format.  But I'm fairly certain that it will still be CPU-intensive to deserialize large amounts of JSON into an ABL temp-table.

Please let me know if anyone has a faster or cleaner way to map this .Net data back into ABL.   I'd love to hear about any other options.

Posted by dbeavon on 02-May-2019 00:23

A long time ago I had added an "enhancement idea" to support SQL92 queries from ABL.  

It seems that the idea is almost a moot point if we can just use the CLR bridge to use full-featured odbc clients.  Not only can we connect to an openedge database via SQL92 but we can connect to lots of other databases too.  All this is made possible by moving to Windows instead of HP-UX!

Anyway here is that old enhancement idea if anyone still wants to vote:

community.progress.com/.../make_use_of_sql92_query_optimizer_from_an_abl_program

Posted by tbergman on 02-May-2019 12:46

My real world use for this, aside from writing fun demos, is to create a SQL server table from a Progress temp-table, then use this SQL table from Excel for report generation.

After trying many techniques for the actual writing of the data, I found that the fastest method, by a large margin, was to generate a .p file on the fly with the basic FOR EACH and assignment statements and then run the newly generated .p.

This technique was partly necessitated by the need to massage some Progress data types before they go into SQL.. For example, we need to apply formatting to some character fields. By overloading the format and/or column-format attribute of the TT field, the program can know to use a format rather than the raw data value. BLOB fields need to be converted to Byte Arrays before they go into SQL etc. The massaging logic gets written to the generated .p.

I believe the presentation you downloaded also contains a file named sqldb.p. This program demonstrates the creation of a MSSQL database and tables, then the filling of the data using a .Net dataset and then the System.Data.SqlClient.SqlBulkCopy class. This is probably the fastest way to get data into an MSSQL database but does not offer that same ability to alter the data prior to sending as other techniques.  

Posted by dbeavon on 02-May-2019 13:53

>>create a SQL server table from a Progress temp-table, then use this SQL table from Excel for report generation.

That makes sense;  I'm sending data in the opposite direction - from a SQL92 ODBC query back into a Progress temp-table in an ABL session.

As a side, you might want to check out SQLite (embedded database that uses a local file) for your scenario too.  The nice thing about that approach is the fact that SQLite is in the public domain and there is no server-side infrastructure (which might be somewhat overkill for an Excel report).  Excel can pull in the SQLite data quite easily too (or at least that is what I've read, I've familiar with using Excel power pivot, but haven't specifically connected to SQLite via its odbc driver yet).  Either approach is a reasonable alternative to buying the full Pro2 product, which might also be a way to solve the problem while avoiding the need for a programmer.

>>  generate a .p file on the fly with the basic FOR EACH and assignment statements and then run the newly generated .p.

In my case, I will extrapolate that the generated .p FOR EACH would look a lot like your SqlDataReader (SQLDemo1).  Even as a generated .p it will be quite slow because of all the interop going on between ABL's CLR-bridge and the OdbcDataReader.  Whereas a FOR EACH that uses a simple IMPORT statement is quite fast (as long as the data file has been written properly to a file).

>> necessitated by the need to massage some Progress data types before they go into SQL

Since I'm getting OE data out of an OE database (via ODBC) and putting it (eventually) in OE temp tables, there should be no substantial "massaging" (ETL) should be necessary.  At least the TT data will be no different than what our ABL code is used to.

Thanks for the additional details.  Let me know if you have any thoughts about quickly moving SQL92 data from an .Net DataTable into an ABL temp-table.

Posted by ske on 10-May-2019 11:00

dbeavon said:

> ... I don't particularly like the IMPORT format since it is not binary and

> doesn't seem standardized (any more than a csv). I'm not eager to see what

> happens when my ODBC results have my delimiters within the data itself (or

> it may even have line breaks).

I find the IMPORT/EXPORT format quite easy to work with.

Line breaks within fields are ok, provided that the field is quoted.

Put quotes around string fields that may contain the field delimiter character,

line breaks or other special characters.

Just make sure that if a field contains the quote character as data, double

that character ("" for ").

Use ? for the unknown value.

Use YES or NO for logical fields.

Any date fields of course need to be formatted the Progress way, and are susceptible to the Progress session's current date format options.

Non-Progress data types may need some kind of conversion, of course.

Or use IMPORT UNFORMATTED if you just want to read each line as a single string, with no delimiters.

I guess you could choose to read a binary file too, but then you will have to use other commands than IMPORT, and split it up into fields and records yourself, which will probably be slower.

Posted by dbeavon on 09-Jun-2019 15:09

I was able to get my data moving back and forth between .Net DataTable and ABL temp-table using both XML and JSON.    The JSON approach is quite a bit faster compared to READ-XML (about 3x faster based on the scenarios I'm working with.)  Also it won't ever suffer from blocking bugs in the Xerxes XML parser.

I also started evaluating an approach that used the Progress "IMPORT" format.  The biggest drawbacks are that it is proprietary, and it requires the fields within a record to be exported and imported in a predetermined sequence.  Whereas with JSON and XML, the fields can be ordered on each side in a different way (or can even mismatch each other to a large degree).  This makes them quite a lot more appealing than the "IMPORT" format, even if they are slower (about 100 ms per 10,000 records for READ-JSON).

I think the EXPORT/IMPORT operations in ABL makes sense for data that will be serialized and deserialized to the *exact* same schema, and only in the context of ABL.  But if you are moving data across different technologies, and/or if your schema formats mismatch in any way then EXPORT/IMPORT is quite a poor choice.

Posted by Thomas Mercer-Hursh on 09-Jun-2019 16:40

But, then, of course it was once the only choice ...

Posted by dbeavon on 09-Jun-2019 16:54

>> But, then, of course it was once the only choice ...

The forums seem to have misplaced the message that you are replying to.  Basically I had said that instead of the ABL "IMPORT" format, I've landed on XML or JSON (and JSON wins where raw performance is a priority).

The "only choice" back in the day was the IMPORT format which is proprietary to ABL.  And it is not a great serialization format if you are sending data between different platforms or different software technologies.  And even where ABL is concerned, it doesn't behave well when schema is out-of-order between the time of serialization and deserialization.

Thanks, David

(PS.  Insofar as the forums go, there has been a bug where if I somehow interact with recent post in any way - even by simply fixing a typo - then the message will go missing.  It is pretty annoying and is one of several recent problems I've seem in the forums in recent months.  I think someone made some changes on the back end and they must still be working out some of the kinks).  

Posted by Thomas Mercer-Hursh on 09-Jun-2019 18:04

I also had the problem of getting a page of white when I tried to look

at the post on-line.

Yes, my point was that, being the original totally generic method,

IMPORT/EXPORT might be expected to be less performant than newer

methods, especially those in which the syntax processing is supported at

the machine language level.

Posted by gus bjorklund on 10-Jun-2019 17:20

> On Jun 9, 2019, at 2:07 PM, Thomas Mercer-Hursh wrote:

>

> those in which the syntax processing is supported at

>

> the machine language level

>

huh????

the xerces parser is written in C. so is IMPORT/EXPORT. so is the JSON handler. none are writtern in assembler, much less "machine language".

Posted by Thomas Mercer-Hursh on 10-Jun-2019 17:56

The intended contrast was to having to do string processing in ABL.

Posted by gus bjorklund on 10-Jun-2019 18:12

ah. sorry, i misunderstood.

string handling in the 4GL is not as good as it could be. Evan, are you listening?

Posted by Evan Bleicher on 10-Jun-2019 21:55

Yes, Gus I'm listening!   To provide Product Management with the necessary input, what specific string handling capabilities are you looking for?  Regular Expression facility built into the ABL?   Something else?

This thread is closed