QUERY and bindingSource

Posted by vancejohnson2 on 31-Jul-2008 15:53

For ultraGrids I've been using the following code in my class constructor after the InitializeComponent ( ) statement. The grid is bound to bindingSource1. The last statement takes a long time to execute when the table contains a lot of records. The one I'm playing with has 200,000 records and it takes 30-45 seconds to execute. I need to have all of the rows available, so I don't want to have to throttle data by modifying the QUERY-PREPARE. Any ideas on how to speed this up?

CREATE QUERY qh.

qh:SET-BUFFERS(BUFFER customer:HANDLE).

qh:QUERY-PREPARE("FOR EACH customer NO-LOCK").

qh:QUERY-OPEN().

bindingSource1:HANDLE = qh.

All Replies

Posted by jmls on 31-Jul-2008 16:09

I have also found a noticable delay when extracting data from the customer table on the sports database.

Have you check to see if it's the number of records that is the issue, or the actual binding command ?

Posted by vancejohnson2 on 31-Jul-2008 16:12

It's definately the number of records. For smaller tables there is no perceptable delay.

Posted by Peter Judge on 31-Jul-2008 16:21

It's definately the number of records. For smaller

tables there is no perceptable delay.

Have you tried using a ProDataSet instead of a query? If I use code similar to the below (coded using the PSDN editor, so may not actually /compile/, but you get the gist of it), I see sub-second times for populating an UltraGrid with all the s2k Customer records.

def temp-table eCus like Customer.

def dataset dsCustomer for eCus.

for each Customer: create eCus. buffer-copy Customer to eCus. end.

bindingSource1:Handle = dataset dsCustomer:Handle.

No query opening etc is required in this case.

-- peter

Posted by vancejohnson2 on 31-Jul-2008 16:30

That method takes even longer. I'm not using the Sports2000 database. I have 200,000 records in my table.

Posted by Thomas Mercer-Hursh on 31-Jul-2008 16:35

Do you really need all 200,000 records available at the start? The obvious way to improve performance would be to fetch an initial batch of records and detect off end conditions or to provide filters. I have some question about the meaningful utility of having 200,000 records in a UI component.

And, to make it worse, let me guess that you are testing against a local database ... imagine sending all 200,000 records across the network.

Posted by Håvard Danielsen on 31-Jul-2008 16:39

Maybe most of the time is needed to fill the grid? Visualizing 200,000 records in a graphical UI might take a while. I guess the number of fields is equally important.

Have you tried to do the binding without the grid?

Many of the UltraGrid samples have 3 tables and I noticed that this is very fast when only the parent table is visible, but some of these samples have an option to expand all rows and this takes a while with sports2000 customer, order and orderline with all fields and no filters. I'm pretty sure this is less data to fill than yours 200000.

Posted by vancejohnson2 on 31-Jul-2008 16:43

We do it today with a browse widget and a query and it works just fine. Our customers have huge databases. The 200,000 record table is in one of our QA databases. Many of our customers have over 10 million records in that table. By the way, the table name is not really customer. I just used that to make it easier to read. It's really an inventory SKU table.

Posted by Håvard Danielsen on 31-Jul-2008 16:46

Many of the UltraGrid samples have 3 tables and I

noticed that this is very fast when only the parent

table is visible, but some of these samples have an

option to expand all rows and this takes a while with

sports2000 customer, order and orderline with all

fields and no filters. I'm pretty sure this is less

data to fill than yours 200000.

Continuing my own post:

Note that I do not (yet) know the details on how the binding source and Progress works together. A 3 table dataset will necessary need far more query opens than a single table and I guess this cannot really be done until someone actually is ready to read the data.

Posted by vancejohnson2 on 31-Jul-2008 16:48

It takes just as long without the grid.

Posted by Thomas Mercer-Hursh on 31-Jul-2008 16:59

OK, but there are a lot of times that people ask a question and the best answer to the question is to change the question. I have a little trouble believing that passing 10 million records from a server to a client to open them in a browser actually works fine, but at least notice that there are two entirely different questions here and I think there is some value in separating them.

Your question is "I can do A, but if I do B, which is the equivalent of A in this new context, the performance is poor." Behind this is a potentially interesting question of whether the performance is generally poor or whether there are special conditions that make it poor and whether there is something fixable in either the use or PSC's implementation which would make it better. All of those are useful questions.

But, to me, there is also the question of whether A or B are actually good things to do in a design ... even if they do work. And, that too, I think, is an interesting question.

In many cases, reformulating a problem causes the problem to disappear.

Posted by Håvard Danielsen on 31-Jul-2008 17:00

The browse widget is able to scroll ANY amount of data fast as long as you have the data there in the same session.

I'm suspecting that the grid is a bit different.

Posted by Thomas Mercer-Hursh on 31-Jul-2008 17:04

Might this be because the browse widget is associated with a set of data, but only populated with the visible data, but the grid is actually populated with the entire set of data? If so, that would be a rather fundamental difference in operation which would suggest a need for different handling.

Posted by vancejohnson2 on 31-Jul-2008 17:21

I not going to post on this forum without researching a issue. I know our code very well and I know that we use browsers everywhere and we do not throttle data. Most of our users run GUI and the app is client/server (fat client). So, to prove a point, I went into the AppBuilder, created a window, added a browse widget, used the same table, added the same fields, ran the program, and the browse widget filled immediately and all of the records were available. Pressing the End key displayed the records at the end of the table.

Posted by Håvard Danielsen on 31-Jul-2008 17:28

That was my assumption, but Vance test indicates that the binding source also needs all data (or a list of the number of records maybe?).

I should probably stop guessing now and leave the answers to those who actually know this.

I also based my assumption on the fact that there is an UltraGrid sample that browses 1 million rows very fast. But it does this by turning off all default behavior.

Instead the data is read in an event that fires for each visual row. The row index is passed to the event enabling you to read the right record. I think this is a last resort solution, as it makes it extremely hard to support sorting and repositioning (you would need to override all defaults and also be very clever to find the correct record from the physical grid position), but it is theoretically possible to make a grid fast on any number of data if you are able to find single records fast enough.

Posted by Thomas Mercer-Hursh on 31-Jul-2008 17:32

OK, like I said. It works the way you are doing it now. It doesn't seem to work, or at least perform well, with the new controls. There are at least three reasons why this might be so:

1) It has something to do with usage and someone will come up with a slightly different option or procedure which will make the issue disappear. This is possible, but not my first guess.

2) There is a bug or design flaw in the implementation of the components you are using or in their connection to Progress. If so, you are likely to have to wait until at least FCS if not longer. Testing with someone else's grid control might tell you if it was specific to this grid.

3) There is an inherent difference at a low level in how the browse widget works and the way a grid works and this difference in inherent design is the reason for the difference in performance. If this is the issue, then either you have to find a different grid control with a different design or decide to do something which is different than what you are doing now or you need to accept that this is just the way it behaves. If the delay is unacceptable with 200K records, it probably is unacceptable and then some with 10M records. This is actually my first guess, although admittedly is it all guessing at this point.

Since I have an inherent doubt about flinging sets of 10 M records around the network, my approach would be to rethink the approach, but that's me.

Posted by Thomas Mercer-Hursh on 31-Jul-2008 17:43

I also based my assumption on the fact that there is an UltraGrid sample that browses 1 million rows very fast. But it does this by turning off all default behavior.

Instead the data is read in an event that fires for each visual row.

That sounds like I might be right about the issue being populating the grid with all rows because your description sounds like they are limiting the population to that which is visible. It reminds me vaguely of an experience working with a student last fall who was presenting some data in a tree view within Eclipse. It was a simple SQL query to fetch the data set, but then a whole lot of processing to turn the nice relational data into an hierarchical set of linked objects. She didn't get far enough or have enough data to really measure how bad the performance hit was going to be, but it is certainly a familiar kind of impedance mismatch of relational to object.

Posted by Simon de Kraa on 01-Aug-2008 01:10

I have also found a noticable delay when extracting

data from the customer table on the sports database.

+1

Posted by jmls on 01-Aug-2008 01:11

Vance, I think that the issue is the browser does not read all the records in the query. When you pressed end, it went to the end of the table using an index, and read enough records to display in the browser.

The grid widget needs to read all records in order to do fancy things like sorting / grouping etc etc

Posted by jmls on 01-Aug-2008 01:11

Simon, it's interesting that you've got the same problem.

Posted by Simon de Kraa on 01-Aug-2008 01:25

Yes, but I haven't investigated it further.

I saw your post and thought: hey, same here!

Posted by Håvard Danielsen on 01-Aug-2008 06:16

Vance, I think that the issue is the browser does not read all the records in the query. When you pressed end, it went to the end of the table using an index, and read enough records to display in the browser.

Yes, although I think it really is the way the query's results list is managed that decides how fast the various operations are. Options like indexed-reposition tells the query that it can use indexes for repositioning. A forward-only query can be faster because it never needs to go back again and thus doesn't need to build the result list while browsing forward. One of the drawbacks with an incomplete result list is that you cannot ask it for the relative position and how many records it has. Using preselect instead of for each tells the query to build the whole list up front and makes it able to give these numbers. The thumb on the browse will also be correctly sized and positioned to give the user feedback on size of data and position, but this has a negative effect on perfromance.

The grid widget needs to read all records in order to do fancy things like sorting / grouping etc etc

I also think you are right that “all” data need to be made available to .Net. I guess it really is not the grid itself that needs to have all data, as Vance test indicates, but rather that this is the whole point of the binding source. Sorting, filtering, group by and summaries are all features that require access to all data, not just chunks of data.

The binding source does support batching, which implies that no-batching means "all" data. The batching can be used to allow larger amounts of data to be browsed, but you will need to block or override all the afore mentioned data manipulating features, as they only apply to the data in the bindingsource/grid.

Edited to remove some typos..

Havard Danielsen

Posted by jmls on 01-Aug-2008 06:23

I'll have to look into it further. I've been told by the PHB that I can have a couple of weeks "playing" with the beta, so I should be able to do some serious digging

Posted by jmls on 01-Aug-2008 13:50

Ok, I've looked into this and there is something strange going on.

this is the constructor of the main form of my demo:

CONSTRUCTOR PUBLIC test3 ( ):

SUPER().

DEF VAR i AS INT.

i = MTIME.

InitializeComponent ( ).

MESSAGE "#" MTIME - i VIEW-AS ALERT-BOX.

END CONSTRUCTOR.

the time taken to load this window, which loads my abl control which is a databound control to the sports customer table is .3 seconds.

from this point to the window becoming usable (widgets appearing) is nearly 10 seconds.

where can I found out where this delay is happening ?

It is definitely related to the size of the query - less than 100 record and it takes 2 seconds.

Posted by jmls on 01-Aug-2008 13:53

having said that, I feel that 2 seconds is an extraordinarily long time to populate a browser with less than 100 rows. The window goes all white, then black, then "appears".

Posted by maura on 06-Aug-2008 15:21

The major difference here (which impacts performance) is that the browse implementation does not need to know how many records it will be browsing. The .NET grids need to know how many records they will be navigating through, so that forces our implementation of the ProBindingSource to get that info upfront from the query.

In our documentation, we strongly recommend that the query be opened using the “preselect” option instead of the “for” when binding to a ProBindingSource.

If the “for” option is specified, the query initially does not have record count, so the ProBindingSource has to read each record to get the count. If the Preselect is specified, this optimizes query performance. (The query code up front reads the rowid for each record into the result list.)

Note that the performance cannot be identical to the browser given this requirement from the .NET grids, but should improve a bunch using the “preselect”.

Maura Regan

Posted by jmls on 06-Aug-2008 15:58

Thanks Maura. I'll look more into this.

Is this the same case for all databound controls (for example, the MS combo--box) ?

Posted by vancejohnson2 on 06-Aug-2008 17:28

I tried it with a MS combo-box and the speed is the same.

Posted by vancejohnson2 on 06-Aug-2008 17:35

Using PRESELECT instead of FOR made no appreciable difference.

Posted by maura on 07-Aug-2008 08:47

Yes. You should do so for any ProBindingSource you create.

Posted by maura on 07-Aug-2008 08:57

You should enter a bug for this then and we will look into possible performance improvements. I got an email from Rob Debbage on this issue already so he may be in the process of doing so.

As I said earlier, the performance will not be as good as the browser due to the .NET controls requirement of needing to know the record count.

Using the ProBindingSource batching mechanism would be useful for your case, but I believe Rob's email said that you do not want to do batching. Is this something you could reconsider doing?

Posted by Thomas Mercer-Hursh on 07-Aug-2008 11:13

In the call this morning there was reference to a sample which used batching, but it isn't obvious to me that any of the sample names I have gotten thus far would include batching. If there is a batching example, it seems like it would be a big help.

Posted by vancejohnson2 on 07-Aug-2008 13:07

If the record count is the only impediment, then we could supply MAX-DATA-QUESS, or PSC could keep record counts for tables in the database. The latter has been an enhancement request for a very long time.

If a batching mechanism exists, I'd like to see it.

Posted by vancejohnson2 on 07-Aug-2008 13:11

If the record count is the only impediment, then we could supply MAX-DATA-QUESS, or PSC could keep record counts for tables in the database. The latter has been an enhancement request for a very long time.

If a batching mechanism exists, I'd like to see it.

Posted by Håvard Danielsen on 07-Aug-2008 14:04

It is necessary to know the ordinal position of each record, so I doubt that a MAX-DATA-GUESS would help much. You get both a count and positions with PRESELECT, which tells the query to build a temporary index of the records in the query. But this might also be the reason that you do not see any performance improvements with PRESELECT.

Posted by Håvard Danielsen on 07-Aug-2008 14:14

There are no batching examples in the samples.

The reference might have been to the UltraGrid sample that browses million rows virtually, but I do not think that approach is able to resolve fast read directly from the database, since it has to resolve which records to show from the grid's ordinal position. (Except perhaps if your data has a sequential key with no gaps... and you never need to sort...).

Posted by Thomas Mercer-Hursh on 07-Aug-2008 14:25

So, it would seem that it is a priority task for someone to come up with a real batching example so that we can see how well it works with large datasets.

Posted by Peter Judge on 07-Aug-2008 14:49

So, it would seem that it is a priority task for

someone to come up with a real batching example so

that we can see how well it works with large datasets.

That's a good idea.

We're planning on adding some more assorted samples for FCS to those already available for Beta, and we'll try (as a priority) to add a batching example to those.

-- peter

Posted by Admin on 07-Aug-2008 14:59

See my batching sample in a separate thread.

Posted by Thomas Mercer-Hursh on 07-Aug-2008 15:49

If you are going to distribute these samples with the product, I think a little work should go into them so that they can all be loaded in the same project with a common assemblies.xml file, appropriate pathing, etc.

Posted by Peter Judge on 07-Aug-2008 16:18

If you are going to distribute these samples with the

product, I think a little work should go into them so

that they can all be loaded in the same project with

a common assemblies.xml file, appropriate pathing,

etc.

My preference runs to having one workspace, with each sample as its own project. But I completely agree that thought around the packaging is required - especially since part of their purpose is to allow rooting around in the code.

-- peter

Posted by Thomas Mercer-Hursh on 07-Aug-2008 16:37

For me, the idea that this is a collection of samples suggests that it should be one project. Then, I can have 10 or 20 or 30 samples and it will still close down to one line. If I create a workspace with 30 samples, each their own project, then I have 30 lines and basically I have to decide to have a workspace of samples because it would be too cluttered to add anything else. Whereas, able to make them a project in with my other code means that I can have sample and new code open at the same time. And, of course, each project requires some setup.

Of course, if all samples were in com.progress.samples.NewUI or some such and there was a tool for combining two assemblies.xml into one, then it wouldn't matter since one could put them in one project or many to taste.

Posted by Peter Judge on 07-Aug-2008 16:56

For me, the idea that this is a collection of samples

suggests that it should be one project. Then, I can

have 10 or 20 or 30 samples and it will still close

down to one line. If I create a workspace with 30

samples, each their own project, then I have 30 lines

and basically I have to decide to have a workspace of

samples because it would be too cluttered to add

anything else. Whereas, able to make them a project

in with my other code means that I can have sample

and new code open at the same time. And, of course,

each project requires some setup.

Well, Eclipse does allow you to have import projects into workspaces (either the metadata only or everything). I'm not sure whether the metadata is shared or duplicated, to be honest, but I've found it works fine. If you chose to do this you could import that project you were interested in into another workspace.

But I can certainly see the value in having 1 project, and am now wondering whether it'd be possible to support both styles ...

Of course, if all samples were in

com.progress.samples.NewUI or some such and there was

I prefer OpenEdge.Samples.* myself , but that's neither here nor there. Proper namespacing would definitely be advantageous here.

a tool for combining two assemblies.xml into one,

then it wouldn't matter since one could put them in

one project or many to taste.

I'm not sure whether there'd be any detriment to having all the assemblies in one file, for the samples at least (since they're all on the same version of the IG controls). Something else to look into ...

-- peter

Posted by vancejohnson2 on 07-Aug-2008 17:03

Can you guys please start another thread? This has very little to do with my original posting.

Posted by Thomas Mercer-Hursh on 07-Aug-2008 17:10

I think we're about used up on the sample issue for now, but it looks to me like you should try out Mike's batching code in the new thread he started to see if this solves your problem.

This thread is closed