paging records

Posted by jmls on 03-May-2011 11:22

In a webspeed scenario, what's the best way of paginating data so that the entire dataset is not downloaded.

As an example, the customer table of the sports database has 80 records. If I want to show those 10 at a time, and want to show items 50-60, do I have to read the first 49 and discard, or are there (hidden) mechanisms to start at position 50 ? How would this scale if you had a table with hundreds or thousands of records ?

I know that this must be very basic stuff for most of you .. but please be gentle

All Replies

Posted by Admin on 03-May-2011 11:30

That's basic ProDataset/Data-Source behavior. Read up on the RESTART-ROWID attribute.

Posted by jmls on 03-May-2011 11:50

doh. I keep forgetting about prodatasets

Posted by Peter Judge on 03-May-2011 12:45

doh. I keep forgetting about prodatasets

If you don't want to use 'em, the premise is relatively simple. The execution can be hard, depending on caching, where you want sorting and filtering implemented, whether you want simple forward-only batching/paging or not (ie get rows 21-30 initially and then navigate backwards to 1-10).

But at its simplest, you can open a (sorted) query and reposition to row 10, get 10 rows, store away the last row you got, and repeat. The trick when using row numbers is (a) use a PRESELECT query, and (b) use the SAME query every time.

ProDataSets use ROWIDs, which I'd not (if it can be helped), especially in an n-tier environment where you're dealing with temp-tables and the chance that you'll reuse existing rowids. My approach is laid out here http://communities.progress.com/pcom/docs/DOC-105097 . In the OERA world, there are actually TWO queries that are important - the one the client/Presentation Layer uses to navigate the Business Entity/PDS and the other one which the Data Access layer uses to populate that Entity. These 2 queries do not have to be the same at all. Q(PL) could be on a temp-table, and Q(DA) an XPath query.

Of course, you may not need this level of indirection, but it's stuff worth thinking about when designing your solution.

-- peter

Posted by Admin on 03-May-2011 12:46

my advice... forget about rowids, been there and I'm sure you don't really want to go to see for yourself but that can't stop you from trying

the client speaks on terms of record number not rowid, go down to the data source query if you must use data-source and use reposition forward (or backward if the user is crazy enough to want to see the records he already saw or want to go through them backward... it might happen, you know)

if you pass datasets to the client then the page definition might change... how much data you sent on parents, how much on child(s) and how the navigation is then performed... there's a lot of fun there

Posted by Admin on 03-May-2011 13:00

the client speaks on terms of record number not rowid, go down to the data source query if you must use data-source and use reposition forward (or backward if the user is crazy enough to want to see the records he already saw or want to go through them backward... it might happen, you know)

Record numbers in a stateless or state free world? I doubt that's going to lead anywhere.

Imagine the client is getting the first 10 records, in the meantime someone else would insert 5 records at the beginning and instead of 10 next records you'll get 5 records refreshed that you already have and just 5 new ones...

I can see the point that DB ROWIDs have no meaning on the client - and in the end it does not have to - it's just a bit of context that gets passed around. And if you don't like that, it's trivial to replace that in the context passed around with some other value (GUIDs of the records to prevent reuse, XPath, ...) and handle that as appropriate in the DA layer, i.e. by translating it on that layer into ROWID's and then use the very effective FILL of the PDS or whatever the current. DA needs for repositioning.

Posted by Peter Judge on 03-May-2011 13:09




the client speaks on terms of record number not rowid, go down to the data source query if you must use data-source and use reposition forward (or backward if the user is crazy enough to want to see the records he already saw or want to go through them backward... it might happen, you know)




I see backwards paging all the time on the web: look at any google search, and the page numbers at the bottom of the page. You can click on page 1 then 12 then 2. That paradigm is pretty common; what's not is when you're caching the retrieved data on the client - when you click 12, do you get 1-12, or no?  I suspect most web pages just show you the one you asked for.

-- peter

Posted by Admin on 03-May-2011 13:20

I see backwards paging all the time on the web: look at any google search, and the page numbers at the bottom of the page. You can click on page 1 then 12 then 2. That paradigm is pretty common; what's not is when you're caching the retrieved data on the client - when you click 12, do you get 1-12, or no? I suspect most web pages just show you the one you asked for.

I have a problem with that comparison, nobody will be able to validate Googles paging because it's a proprietary algorithm creating the result data.

And also a classical item lookup on a website (as opposed to a web application) is not the same as in a typical web application. The data is usually pretty stable.

But try to explain an end user unreliable batching in a web application that is a mobile substitute for a GUI application. They won't accept any unclear paging behavior like skipped records or to small batch sizes, both which could be consequences of defining the pages by record numbers when at the same time the underlying data is changing.

Posted by Peter Judge on 03-May-2011 13:45

I have a problem with that comparison, nobody will be able to validate

Googles paging because it's a proprietary algorithm creating the result data.

I have no idea either; I only wanted to point out that jumping forward and backwards between batches or pages is common behaviour. My statement didn't make that clear.

-- peter

Posted by jmls on 03-May-2011 14:15

Doesn't a preselect mean that all records in the query have to be read in first ?

Mind you, in order to return a "total number of records" you would have to do that anyway.

urk.

Posted by Admin on 03-May-2011 14:20

Doesn't a preselect mean that all records in the query have to be read in first ?

Mind you, in order to return a "total number of records" you would have to do that anyway.

Yes, but when doing batching in a stateless world, you'd have to run preselect on each request. That might - when others are creating or deleting records at the same time - return a different number at each request.

Posted by rbf on 03-May-2011 14:24

Julian,

Whatever method you use (datasets or not) you need to reposition the query to where you left off for each new batch which in general means that you have to reread all the records up to that point.

There is some optimization and in some cases indexed-reposition will work but when it really matters (i.e. in the case of complex queries with joins) that won't work and all records will have to be reread.

This is not such a problem in general if you provide plenty of filtering possibilities. That way your users will stay in the low numbers.

But never provide a total count or an END button.

Also be carefull with offering sorting.

-peter

Posted by Admin on 04-May-2011 00:49

Thanks Mike for sharing your thoughts, as I've said... been there done that, never want to do it again

All of your arguments (as less probable as the situation you describe might be) were what I've used to fight the web developers and user interface designers at some point... all are pure technical issues and in practice is hard to stumble over such extreme situation I think, if we were to make catastrophically prediction then it might just as well happen that someone deleted that record it points to the next page. What you do in that case, reposition won't do anything and you start give them records from the start... unless a new records gets created and the rowid is re-used in which case you might fall to a totally different page number than the user expect.


Other thing to consider when using rowid is that for a pure DA that does not only means using the data-server there are some RDBMS that does not have that functionality... one might consider enforcing primary surrogate key for each table and use that as technical identifier, at least that was the path that I had in mind for Quarix at some point but I really can't tell if that would lead anywhere at all

Posted by Admin on 04-May-2011 00:55

There is no such a thing as reliable paging, in any OLTP application where data can change very often the user might well stare at records that are not there anymore or were updated... the only way to make it reliable is to either connect using single user mode or lock all records in exclusive but this is not possible in a state-free world isn't it?

Even if you use rowid, if someone updated the index fields of a record that make it rank different when you want to get the next page you might just get the same record as before only with updated data... it used to rank in page 3 but now given the sort options and the fact that values in index fields were changed put it in page 4, it that more reliable?

Posted by jmls on 04-May-2011 01:09

I've been asked to provide paging functionality on a grid within a web page that includes

page x of y (total = foo)

So, I think that I have the following choices

1: re-read the entire tableon every request

2: Use a local paging source, but that would mean reading the entire table (once), and passing the data over to the client. However, I would assume that this means any changes to the underlying data would not be seen until a refresh.

3: Use a prodataset and start-rowid (if that rowid does not exist, re-read the query)

any others ? This may be for a table with 10 records or up to 1000 records

Posted by Admin on 04-May-2011 01:18

Oh Peter, the good old Dutch school of thought... again nothing I can't agree on from the technical point of view, try to explain that to a user interface designer (especially if it's about a web application). I'm sure you had your reasons and setting a limit on the maximum number of records while imposing the usage of filters upfront can be a valid solution, no need for pagination once or ever. Both DWP and Imo are doing that and peoples seems quite happy with that for a regular desktop application (web-client is still fat client), when we tried to put something on the web and saw the first time the user design document we start fighting instantly over that total number of records thing (mind you that we had pagination at the time, with end button and everything). However if you want to book your holiday and search using some minimal criteria you still expect how many options do you have, if you enforce or relax the selection criteria the number of options can give you a better idea on what the options are.

As this is in webspeed section then we might think we talk about a web application, in that world pagination, random navigation, end button and total number of records are 'default functionality'... every user interface designer will put that in his application design and what is more important the user really expect that. We can argue as much as we want over the technical consequences but the point is, the answer is not to restrict that functionality due to technical limitations but to try to find solutions to make it possible

Google search algorithm, or any other search engine does not give you an exact number but an approximate one... this means that they don't use 'preselect' (not that they are using progress database but you got the picture), there was to be some other ways to do that and for sure we can implement in the application instead of simply relying on the database itself and give them the argument that technically this will lead to worst performance ever or eventually lead the application unusable... it's our business to make it usable and user friendly. Damn, I just realize that sometime when I wasn't paying attention I've simply got older

Posted by Admin on 04-May-2011 01:45

page x of y (total = foo)

 

Calculate that only on the first request, stop at a reasonable amount (Marian will disagree). But there is a - user acceptable - difference between a website with dynamic content and a web application. The difference is the user type (in relation to the application because for different applications the same user could be of a different type).

1: re-read the entire table on every request

 

Don't let anybody tell you to do that.

2: Use a local paging source, but that would mean reading the entire table (once), and passing the data over to the client.

However, I would assume that this means any changes to the underlying data would not be seen until a refresh.

 

Difficult with large data sets, load time may not be acceptable and will require an Ajax framework or a lot of Javascript coding.

3: Use a prodataset and start-rowid (if that rowid does not exist, re-read the query)

>

any others ? This may be for a table with 10 records or up to 1000 records

I think the realistic maximum number of records makes a huge difference. Is it a phone book for the organization or the whole of UK?

With not more than 1000 records you can rely on the database cache (-B) to give you the performance you need. If you add two or three digits to that number you'll be in need for optimization. No need to try "Google-style" estimates of records when you can precisely say that you've got 1.042 records in the query result list. With 1.000.042 this will become more difficult.

Posted by Admin on 04-May-2011 01:50

All of your arguments (as less probable as the situation you describe might be) were what I've used to fight the web developers and user interface designers at some point...

I cannot see why a user, web developer of UI designer would mind how the application handles the actual repositioning in the data base (GUID, ROWID, ROW-NUM). They just want the result they expect.

all are pure technical issues and in practice is hard to stumble over such extreme situation I think,

It is a technical forum...

And how extreme the examples are depends a lot on the use case: In a phone book I doubt it's very realistic. When we are looking at an order index that needs to be processed by sales admin folks I guess it's much more realistic. The use case makes the difference.

one might consider enforcing primary surrogate key for each table and use that as technical identifier, at least that was the path that I had in mind for Quarix at some point but I really can't tell if that would lead anywhere at all

That's why I mentioned the GUID. What kind of value you store and pass around is just relevant to the DA layer. And the nice thing about keeping the DA layer separate is that you can optimize it for the current database or other data source.

Posted by Admin on 04-May-2011 01:50

Even if you use rowid, if someone updated the index fields of a record that make it rank different when you want to get the next page you might just get the same record as before only with updated data... it used to rank in page 3 but now given the sort options and the fact that values in index fields were changed put it in page 4, it that more reliable?

The question is which case is more realistic and more disturbing to the user. I've experienced that paging side effects based on ROWID or any other unique record identifier are more acceptable to the user than paging side effects based on record numbers.

Finally, mind you sharing with us what you have gotten on negative experiences about ROWIDs (or unique record identifiers).

Posted by Admin on 04-May-2011 02:08


page x of y (total = foo)

Calculate that only on the first request, stop at a reasonable amount (Marian will disagree). But there is a - user acceptable - difference between a website with dynamic content and a web application. The difference is the user type (in relation to the application because for different applications the same user could be of a different type).

Why do you think such a thing, I can't agree more on what you've just said. Reading all records to get the total number is not a good idea, definitively has not to be done on each request.

Stop at reasonable amount is an option, just try to reposition to something like 20-50 pages of records... give the the exact count if less than that or something like 'more than 50 pages...', they'll leave with that as no one expects to browse the whole table just to see if the estimate number of records is accurate or not

Posted by davidkerkhofs3 on 21-Mar-2012 09:57

Well,

this has been a lively discussion but can I focus on the fact that I don't see any agreement on the best approach?

First things first:

1) I guess the datamodel the application is using will dictate whatever options you have to page the records.

    If the datamodel has been properly organized, each table will have its own unique single field primary key, preferably sorted DESC. (how do you this with a GUID?)

    Paging becomes easier because the old stuff is on the next page.

    That way the last pagelink would contain something like getRecords(50 to 1), actually it means having a one-time cost for calculating the pages (not speaking for deleting records of course).

    Why can the correctness of the record count be important?

    Because you don't want to do this exercise for every set of data; and some data will be volatile, other is perhaps very stable.

2) Rowid's are arbitrary, as explained by Mike, and more important not always exchangeable using dataservers.

    ADM2 seems to do a fair job in paging (but it uses rowids).

    I remember using the BATCH-SIZE attribute on a PDS when I did a project in the TP10.2. Anybody any experience with that?

We're in the middle of opening a main part of our application to the world (ww).

Paging wasn't really a concern before (in principal I disagree with the original architects for not having implemented it at the very beginning).

But I'm too, like Julian is/was, starting a first investigation on the best ways to page the data.

Julian, what did you finally end up with?

This thread is closed