I'm trying to do some research to find out the best way to move data from Ado.Net into ABL datasets via the CLR bridge. Ideally it could be done in a clean way, within process memory, without sending data thru intermediate files on disc. Also it should be done as quickly as possible.
I haven't settled on anything yet, primarily because there doesn't appear to be any straight-forward solution.
By using the CLR bridge to loop thru a .Net data reader, and copy records one at a time, I can move all my data. This is probably the "cleanest" option but involves quite a lot of CPU - because of all the related CLR interop. Similarly I can move data by exporting it all to XML and parsing that back into an ABL dataset with READ-XML. This is fairly clean and fairly CPU-intensive. The XML approach is only slightly less "clean" than the approach using the data reader. Both are fairly slow.
At the moment I'm investigating the "bindingsource" that Progress created for the "GUI for .Net". This could be another avenue for getting my Ado.Net data out of .Net and send it back to an ABL dataset. I found the assembly ( "Progress.NetUI.dll") and some docs ( https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvngp/understanding-the-probindingsource.html ).
But I have some concerns:
Any tips would be greatly appreciated. We are moving out of the HP-UX platform and I'm not totally familiar yet with all the features that are available to OE customers on Windows via the CLR Bridge. I have seen few discussions about ProBindingSource, especially outside the context of the "GUI for .Net".
|
||||
Brian Maher
|
||||
Principal Engineer, Technical Support
|
||||
Progress
|
||||
14 Oak Park | Bedford, MA 01730 | USA
|
||||
|
||||
|
|
||||
Brian Maher
|
||||
Principal Engineer, Technical Support
|
||||
Progress
|
||||
14 Oak Park | Bedford, MA 01730 | USA
|
||||
|
||||
|
The ProBindingSource is meant to get data from an ABL table and make it available to .NET, not the other way around - i.e., read data from .NET and copy it into an ABL table. So I'm really not sure how you would use this for your use-case.
No you should not use the Progress.Data.DataSource directly. This could change at any time and is not meant to be used by ABL programmers. I don't see what that buys you anyway. Avoid classes like System.Windows.Forms"??? What does that have to do with the BindingSource? Plus, what BindingSource overhead are you talking about? Anyway, I don't believe you can use Progress.Data.DataSource w/o using a BindingSource, period. One refers to the other. If there is no BindingSource, it will not work.
Why are you worried about the size of the NetUI.dll assembly?
>>The ProBindingSource is meant to get data from an ABL table and make it available to .NET, not the other way around - i.e., read data from .NET and copy it into an ABL table.
Doesn't it allow users to edit and add data? I think that is the intended purpose, right? Moving data programmatically back into ABL is not the intended purpose but I suspect it would work as long as it is possible from "GUI for .Net". If a user can enter data into a .Net control and send it back to an ABL table, then that should be possible to do it programmatically as well. This approach would only appeal to me if the technology performed better than parsing lots of text (ie. XML or JSON).
>> No you should not use the Progress.Data.DataSource directly. This could change at any time and is not meant to be used by ABL programmers
I can see your point. But the DataSource class was public and it was implementing some pretty powerful (and familiar) interfaces like IBindingList so I was hoping it wouldn't be totally off-limits. I was digging into ProBindingSource to see where the magic happens. Then I found DataSource so I thought I'd ask. It would have made sense to use that, rather than place a PASOE dependency on classes within System.Windows.Forms.
>> Avoid classes like System.Windows.Forms?
Yes, In a .Net app if I put a dependency on (System.Windows.Forms.)BindingSource it loads System.Windows.Forms.dll which is a user-interface library (and a winforms one at that). It would seem odd to me to place a dependency on winforms if my code is running within a PASOE msagent.
But in contrast, your DataSource class does *not* contain any major user-interface dependencies. It is based on interfaces from System.ComponentModel which is found in the System.dll. For reference sake, below is the simplified public interface of that class. It will compile successfully without any assembly reference to System.Windows.Forms.dll.
using System; using System.Collections; using System.ComponentModel; public class DataSource : ITypedList, IBindingList, ICancelAddNew ...
As I mentioned before, it seems odd to me that I would ever place a dependency on winforms if my code is running in PASOE. There should be a separation of concerns. And PASOE would never be concerned with presenting a winforms user interface. But placing a dependency on the System.dll doesn't bother me at all.
>> Why are you worried about the size of the NetUI.dll assembly?
I'm not. I'm mainly concerned about the size of the customer base that uses the assembly, and also I'm concerned about the size of data that I need to convert from the .Net side of things back into ABL.
Given that we don't use "GUI for .Net", I wouldn't have known about the ProBindingSource option except for the fact that it came up in a prior topic that is related to what I'm doing : community.progress.com/.../1231
FYI, I can see you have previously cautioned people against using ProBindingSource for moving data programmatically, but it is not clear why.
community.progress.com/.../23978
At a high level I suspect the ProBindingSource is a bad fit for use within PASOE. But ideally there would be a similar API that could be used for quickly transferring data back and forth across the CLR bridge. I was hoping to find something useful under the covers of the ProBindingSource (something that I could use in PASOE as well).
> Doesn't it allow users to edit and add data? I think that is the intended purpose, right?
That's part of what it does, since controls allow you to edit and add data. The primary direction of data is from ABL to the control. Any new data (edited or added) comes from the user/user interface, not from a .NET object holding its own data. The update of edited added data in a control is governed by events that the control fires or methods that it calls on its data source (i.e., the BindingSource). An Ado .NET object is not going to do those things.
Ultimately, we have to read each row of your .NET object, marshal the data from .NET back to the ABL and make temp-table records, copying in the data. You can't get around that. It sounds like you just want this written in the AVM rather than in the ABL, which I think would make it somewhat faster. Not clear though how much. Sorry, we have no such thing right now.
> Doesn't it allow users to edit and add data? I think that is the intended purpose, right?
That's part of what it does, since controls allow you to edit and add data. The primary direction of data is from ABL to the control. Any new data (edited or added) comes from the user/user interface, not from a .NET object holding its own data. The update of edited added data in a control is governed by events that the control fires or methods that it calls on its data source (i.e., the BindingSource). An Ado .NET object is not going to do those things.
Ultimately, we have to read each row of your .NET object, marshal the data from .NET back to the ABL and make temp-table records, copying in the data. You can't get around that. It sounds like you just want this written in the AVM rather than in the ABL, which I think would make it somewhat faster. Not clear though how much. Sorry, we have no such thing right now.
Yes, I would do the looping in .Net and compile that into a custom assembly that would be referenced by ABL (PASOE). The surface area that is exposed to ABL would involve only a single round-trip into the CLR bridge; and that round-trip would do all the work for an entire dataset (sending .Net data back into ABL temp-tables via the BindingSource).
>It sounds like you just want this written in the AVM [sic] rather than in the ABL, which I think would make it somewhat faster.
If could be faster, then that is encouraging. Using the standard interfaces (the ones on ProBindingSource) seems like a safe bet, and hopefully this has a potential to be faster then EXPORT/IMPORT. I was originally pretty eager to take a stab at an EXPORT/IMPORT approach ... but I realized that I would need to invest a lot of time building my own home-grown c#.net code around the proprietary "EXPORT" file format that is used by OpenEdge. That is not so appealing. Whereas the public interface to ProBindingSource is more standardized (and is not entirely text-based like XML/JSON, and doesn't require data to be sent to intermediate files on disk, and hopefully has better performance than EXPORT/IMPORT).
Doing all this from within PASOE may be a trick. One concern I have is based on the fact that a PASOE msagent can be such a long-lived and high-volume process. Typically most winforms components are made for a GUI process. The GUI is a single-user process, and the user may work all day long but not nearly as hard as an msagent. And eventually the GUI process is stopped and restarted, flushing away any potentially rooted memory-references. And in any case even if there are rooted-memory-references they are isolated to a single-user process and they cannot affect other people. But a PASOE msagent, on the other hand, is very susceptible to memory-related problems, even minor ones. That ProBindingSource had better not leak a single byte of memory or there will be support cases! ;)
> The surface area that is exposed to ABL would involve only a single round-trip into the CLR bridge; and that round-trip would do all the work for an entire dataset (sending .Net data back into ABL temp-tables via the BindingSource).
You are making an invalid assumption. This could not possibly be done in a single round-trip to the CLR bridge. If the algorithm is designed for any sized data set, you cannot possibly convert & hold the data from a million or more records so that all the field values can be converted and sent back at once from the CLR and then copied into temp-tables. In fact the BindingSource actually only gets one field at a time. I can envision some other facility that might be written in the AVM that could handle one row at a time. But that's about it.
It still sounds like you want to use the ProBindingSource and I haven't heard anything yet that makes this a viable option. Maybe you can expand on how you intend to use it.
>>This could not possibly be done in a single round-trip to the CLR bridge
From the perspective of a developer using the bridge, it appears like everything is being done in a single round-trip. For example I am calling this method below to receive all the data. It fills a table with three columns: a customer code, name, and number. When execution of the method is complete, my ABL TT is filled with all the data that I need.
public static void ReceiveAllData(BindingSource p_Source) { var BindingSourceObj = p_Source; BindingSourceObj.AutoUpdate = true; var Test = BindingSourceObj.GetItemProperties(null); var CodeProp = Test[0]; var NameProp = Test[1]; var NumberProp = Test[2]; for (int i = 0; i < 10000; i++) { var RowNew = BindingSourceObj.AddNew(); CodeProp.SetValue(RowNew, "Code" + i.ToString()); NameProp.SetValue(RowNew, "Cust Name " + i.ToString()); NumberProp.SetValue(RowNew, i); BindingSourceObj.EndEdit(); } }
I think you are pointing out that (internally) the implementation is making lots of round-trips between the CLR and AVM. I can see that this is happening to a certain extent, but only by setting breakpoints, and watching the CPU performance in VS diagnostic tools. Quite a lot of work is done managing the capacity of an ArrayList member named "h" (thanks for the obfuscated code, btw. ;-)
The ABL side of things was fairly easy to understand. I liked everything about this approach, aside from the performance:
USING ClassLibrary1.* FROM ASSEMBLY. DEFINE TEMP-TABLE ttCustomer NO-UNDO FIELD CustomerCode AS CHARACTER FIELD CustomerName AS CHARACTER FIELD CustomerNumber AS INTEGER. DEFINE DATASET dsCustOrder FOR ttCustomer. /* ************************************************************************ */ /* Params and vars */ /* ************************************************************************ */ DEFINE input-OUTPUT PARAMETER DATASET FOR dsCustOrder. DEFINE VARIABLE hTopQuery AS HANDLE NO-UNDO. DEFINE VARIABLE hDataSet AS HANDLE NO-UNDO. DEFINE VARIABLE hTT AS HANDLE NO-UNDO. hDataSet = DATASET dsCustOrder:HANDLE. hTopQuery = hDataSet:TOP-NAV-QUERY(). /* navigation query for customer */ hTopQuery:QUERY-PREPARE("PRESELECT EACH ttCustomer"). hTopQuery:QUERY-OPEN. hTT = BUFFER ttCustomer:HANDLE. /* ************************************************************************ */ /* Prepare binding source */ /* ************************************************************************ */ DEFINE VARIABLE rBindS AS Progress.Data.BindingSource NO-UNDO. rBindS = NEW Progress.Data.BindingSource(hDataSet, hTT). /* ************************************************************************ */ /* Receive */ /* ************************************************************************ */ Class1:ReceiveAllData(rBindS).
Hopefully the code is more or less correct. But please let me know if the code is overkill for populating three columns of data in a temp-table. The performance was not great, and I was a bit disappointed since .Net was in the driver's seat and I expected things would be faster. I guess I can see why this approach is not highly recommended for moving large amounts of data from .Net back into ABL. The probindingsource was not designed with this purpose in mind.
For one thing, the loop to create new rows in the bindingsource became substantially slower as the number of records increased (it wasn't linear for some reason). Creating 10,000 records takes 800 ms. 20,000 takes 2,700 ms. And 100,000 takes ~50,000 ms. Perhaps this has to do with the unusual way that the ArrayList was constantly changing its capacity. Or maybe it is more complex than that.
In any case, I plan to revisit the EXPORT/IMPORT strategy. I will look into the creation of the “EXPORT” format from .Net code. I suspect that the "EXPORT" format is not hard to work with, unless it needs to deal with exceptional data (multi-line, clobs, quotations, delimiters in the data, etc). If anyone can point me to some .Net code (or java?) that already behaves the same as the EXPORT statement, please let me know. I'm wondering if an EXPORT operation ever happens from the PCT code? Once I have EXPORT/IMPORT working then it should perform much better than all the other options I’ve considered so far. I still think it is a bit unfortunate that we would need to write data out to disk and then read it back into memory again (within the same process). But even if the data takes a detour, I expect it to be faster than the "GUI for .Net" bindingsource (and faster than XML or JSON).
Sorry, but I thought you were getting data from an ADO .NET object. Isn't Test (BindingSourceObj.GetItemProperties(null);) giving you data from your ABL DataSet? In any case, Test will be a .NET object, so you are still going to the clrbridge to get this data and then using the BindingSource to populate a temp-table by calling AddRow! It would be MUCH faster to just create a temp-table record! I thought you were trying to minimize going back and forth across the bridge. You don't need .NET to create temp-table records, and that's what you're doing. This makes no sense to me.
Yes, GetItemProperties(null) gets the schema details about the three TT columns, which the ProBindingSource can then use for updating the TT records from .Net.
This is not the whole picture. Ultimately I'm trying to query data out of an ODBC connection (OdbcConnection) using a SQL statement, and then send it back over to the ABL session for follow-up processing. I was hoping to do that with one fast round-trip to a custom method in a .net assembly. I was hoping everything could be done very quickly. Otherwise it defeats the purpose of using an ODBC connection in the first place (because all the performance benefits of using ODBC would be lost once again while transferring this data back into ABL).
I don't think probindingsource is a good fit for this scenario.
The fastest approach seems to be for me to EXPORT it all to a file from .Net code, and then IMPORT it all back into the ABL side of things. This seems to have low CPU and memory overhead, compared to some of the other options I've looked at. I'm hoping that EXPORT/IMPORT will give me at least 5 to 10,000 rows per second. I guess we'll find out!
Yes, GetItemProperties(null) gets the schema details about the three TT columns, which the ProBindingSource can then use for updating the TT records from .Net.
This is not the whole picture. Ultimately I'm trying to query data out of an ODBC connection (OdbcConnection) using a SQL statement, and then send it back over to the ABL session for follow-up processing. I was hoping to do that with one fast round-trip to a custom method in a .net assembly. I was hoping everything could be done very quickly. Otherwise it defeats the purpose of using an ODBC connection in the first place (because all the performance benefits of using ODBC would be lost once again while transferring this data back into ABL).
I don't think probindingsource is a good fit for this scenario.
The fastest approach seems to be for me to EXPORT it all to a file from .Net code, and then IMPORT it all back into the ABL side of things. This seems to have low CPU and memory overhead, compared to some of the other options I've looked at. I'm hoping that EXPORT/IMPORT will give me at least 5 to 10,000 rows per second. I guess we'll find out!
I abandoned my strategy of using the Progress.Data.BindingSource as a way to move data back and forth quickly between ABL Temp Tables and ADO.Net DataSets.
It isn't designed to address that type of problem. I know that it can quickly supply some tabular data to .Net controls (ie. grids). In other words, sending data from ABL to .Net controls is very efficient. But when we are modifying data in .Net and sending it back to ABL Temp Tables, that side of things is quite slow. This is based on the internal implementation of the underlying Progress.Data.BindingSource. It isn't designed to be automated or to change numerous TT records at a time.
The best approaches I've found for moving data back and forth are to use XML and JSON (and the corresponding methods for these on ProDataSet ).
I had also evaluated the ABL "IMPORT" format but that has a lot of limitations. It is probably a good choice when you are EXPORTING/IMPORTING data from ABL to ABL and the schema format isn't changed between the EXPORT operation and the IMPORT operation. However if you are not using the ABL language for both of the operations, or if the schema isn't identical, then the ABL "IMPORT" format won't work well... because the format is very proprietary, and it doesn't play well with non-ABL technologies.
The ABL "IMPORT" format is especially inflexible if the schema is quite different between the side that is serializing data and the side that is deserializing it. Eg. if the fields are in the wrong sequence, or if the set of fields mismatch each other on one side or the other then the ABL "IMPORT" format will fall apart.
Both JSON and XML work well, but there is a bit of a penalty for picking XML over JSON. In general XML is about 3x slower for the types of scenarios that I was working with and, in PASOE, there is a concurrency bug that can further increase the penalty of using XML. It appears that only one ABL session (within the agent process) can be doing parsing operations on XML data at any given moment of time. Apparently that is a result of the way PASOE interacts with a third-party xml library (the xerxes parser).
In summary, I've abandoned the idea of using Progress.Data.BindingSource for non-GUI purposes. I'll be using XML or JSON for serializing and deserializing data between the ABL and .Net runtimes (where .Net is running within the CLR Bridge).