ABL queries

Posted by Alon Blich on 17-Aug-2006 12:58

There are growing discussions about the ABL queries short comings.

There is considerable effort coming from the community to convince Progress, without much success or so it seems, calling for no index scans.

IMO with all due respect, I think, they're fighting for scraps. What about index only scans, index skip scans etc.

There needs to be a more fundamental change in respect to ABL queries.

A few more things that I feel have been neglected and Progress seems to be accepting is that static and dynamic queries don't have the same fundamental capabilities.

Specifically BREAK BY and CAN-FIND of course they have work arounds/solutions etc.

But they still have considerable tradeoffs and not least important is their complexity and they aren't as obvious to everyone.

I've seen alot of problematic and costly implementations of break by in dynamic queries. I also think the article in the KB could be improved on just abit.

I am very excited about all the advancements in ABL and where its going but I think alot of us would really appreciate some return to basics.

All Replies

Posted by Tim Kuehn on 17-Aug-2006 13:10

There needs to be a more fundamental change in respect to ABL queries.

I'd go farther than that - PSC needs to be more open about how it makes its decisions with respect to the language's future. From what I've seen so far, these decisions appear to be based more on which wheel's making the most noise as much as it is about technical innovation and making the language consistent across platforms (witness the ongoing lack of a dynamic ChUI browse).

Right now - the engine can do no-index scans. The ABL, however, cannot. Why? From the engine guy I talked to, the language people don't think anyone would use it. To me the sheer utility of something like that - or for the compiler when it determines a table scan is needed to satisfy a query condition - would seem self-evident.

A few more things that I feel have been neglected and Progress seems to be accepting is that static and dynamic queries don't have the same fundamental capabilities.

I agree - they did a great job bringing dynamic queries to the language - but then they stopped. While it is possible to make interacting with dynamic and static queries identical, this is just one of the things the language itself should make easier to do.

For anyone who'se interested - a copy of the query manager I wrote to make interacting with a queries easier has been submitted to the "Code Share" area, where's it's currently under review. A copy is available right now at http://amduus.com/OpenSrc/SrcLib/QueryMgr/, and I anticipate it'll be posted to the PEG Utilities page when Greg gets time.

Posted by Alon Blich on 17-Aug-2006 13:49

I applaud you for your contribution. I think you and guys like you make a huge change and value to the community !

I found the query manager to be a very interesting project, though for me handles and abstracting dynamic queries isn't so much of an issue

as features like BREAK BY and CAN-FIND these short comings really complicate things, as I see it.

I'm not saying there aren't hurdles to over come but it makes the language soo much simpler to use.

Of course for one queries aren't forward only like for each statements, though IMVHO its solvable.

Maybe a construct that forward scans queries is worth thinking about ?

Another thing I'd love to see is outer joins in for each statements. Again, it would make the language much more elegant and easier to use.

I'm sure it'll get great feed backs.

No index scans and other types of scans solve real problems and add real value and you can find them in practically every database for years and years now.

Talk about frustrating.

Posted by Tim Kuehn on 17-Aug-2006 14:25

Of course for one queries aren't forward only like for each statements, though I IMVHO its solvable.

That depends on your -noautoreslist setting. The default is not forward only.

For queries associated with a browse, the default is not forward only. For other queries, there's no reliable way to tell if a query's forward-only or not with the current implementation of qh:FORWARD-ONLY.

Posted by Alon Blich on 17-Aug-2006 14:33

I'm aware that queries can be forward-only.

What I meant is that because queries can be scrollable maybe its not a natural fit for BREAKY BY like for each statements.

But again, its in noway to say BREAK BY and queries can't go together.

And the right place for them, for simplicty, certainly performance is a core feature in the language.

Posted by Thomas Mercer-Hursh on 18-Aug-2006 15:36

I'd go farther than that - PSC needs to be more open about how it makes

its decisions with respect to the language's future. From what I've seen so

far, these decisions appear to be based more on which wheel's making the

most noise as much as it is about technical innovation and making the

language consistent across platforms (witness the ongoing lack of a

dynamic ChUI browse).

Actually, I would say that the lack of a dynamic ChUI browse was a good example of them being able to ignore noise!

Historically, it seemed to me that the biggest driver of language innovation was the tools group, i.e., if they needed something, they got it and otherwise it was very chancy. With 10.1A, though, I think we have a clear example of a market-driven and concept0-driven set of enhancements. Indeed, if anything, the irony is in how slow the group that historically might have been driving this innovation has been in actually using any of it. One might argue that ProDataSets was a precursor to this concept driven pattern, but certainly the OO stuff in 10.1A was a pretty dramatic shift.

Regardless, wishing the process would change isn't likely to change the process. Documenting what you want, why you want it, what impact it will have, and how many people it impacts might change the process. Some proposed changes are essentially self documenting because all of the people involved agree that they should happen. Others aren't. You might be able to change that.

Posted by Thomas Mercer-Hursh on 18-Aug-2006 15:38

I'd go farther than that - PSC needs to be more open about how it makes

its decisions with respect to the language's future. From what I've seen so

far, these decisions appear to be based more on which wheel's making the

most noise as much as it is about technical innovation and making the

language consistent across platforms (witness the ongoing lack of a

dynamic ChUI browse).

Actually, I would say that the lack of a dynamic ChUI browse was a good example of them being able to ignore noise!

Historically, it seemed to me that the biggest driver of language innovation was the tools group, i.e., if they needed something, they got it and otherwise it was very chancy. With 10.1A, though, I think we have a clear example of a market-driven and concept0-driven set of enhancements. Indeed, if anything, the irony is in how slow the group that historically might have been driving this innovation has been in actually using any of it. One might argue that ProDataSets was a precursor to this concept driven pattern, but certainly the OO stuff in 10.1A was a pretty dramatic shift.

Regardless, wishing the process would change isn't likely to change the process. Documenting what you want, why you want it, what impact it will have, and how many people it impacts might change the process. Some proposed changes are essentially self documenting because all of the people involved agree that they should happen. Others aren't. You might be able to change that.

Posted by Alon Blich on 18-Aug-2006 18:11

I read all the white papers you’ve published but does it work ? and like wise thats the general impression with the ERS and how useful it is.

Maybe if things were different there would be a lot more people doing the same.

But that's the great thing about this place and having access to the guys at Progress.

There’s always been a huge effort when it comes to high volume transactions. That’s all you hear about when it comes to performance.

But with queries just making them work is good enough.

It’s not just performance queries capabilities still have holes in them.

Like I said, no index scans, index skip scans, index only scans and many other features have real value and solve real world problems.

Bottom line is that ABL queries have been allowed to slip further and further behind.

And that’s what I meant when I said we’re fighting for scraps. Even if we do get no index scans we’ll still be years and years behind.

There needs to be a more fundamental change with respect to that subject.

And not having BREAK BY and CAN-FIND in queries and OUTER-JOINS in for each statements really complicate and lead to many problematic implementations.

Posted by Thomas Mercer-Hursh on 18-Aug-2006 18:43

So, some of what you want is in the engine now and all we need is a language construct to access it. To me, that is a no brainer, but it still might take someone to write it up and make the point.

Some of it is in the SQL engine, but not in the way ABL interacts because of SQL's set orientation vs ABLs record orientation. That probably takes more work because it requires both language and changes in the plumbing. The latter could be non-trivial. But, doesn't mean that someone shouldn't write it up and make the point and see what happens.

The alternative, of course, is to actually use SQL for some of these queries because it will have the fast scan and query optimizer. I think that might bear looking into as well.

Posted by Alon Blich on 18-Aug-2006 19:04

That'd be great ! How does one go about using OpenEdge SQL with ABL

Posted by Thomas Mercer-Hursh on 18-Aug-2006 19:10

Well, probably in a way that is a whole lot like one would use it in any other OO language, by creating a connection object.

Posted by Alon Blich on 18-Aug-2006 19:45

I'm guessing you're suggesting it as an enhancement request to the ABL and not for us to write ?

Maybe, things like XQuery that also combine access to the database will offer a backdoor to those type features in future releases ?

Even though SQL is set oriented and ABL is record slash navigation oriented.

I don't see why no index scans, other types of scans and optimizing queries in one way or another can't be done in ABL.

Posted by Tim Kuehn on 18-Aug-2006 19:56

I don't see why no index scans, other types of scans and optimizing queries in one way or another can't be done in ABL.

The usual "explanation" is "no one would use it."

Posted by Thomas Mercer-Hursh on 19-Aug-2006 13:00

I have, actually, suggested a connection object for ABL type connections, but I'm not actually sure that any language development is needed to provide a SQL connection object.

I don't see why no index scans, other types of scans and optimizing

queries in one way or another can't be done in ABL.

To be sure, it isn't unimaginable, but it might require a bit of twisiting. E.g., suppose the magic to get some of those results was to use PRESELECT, which essentially turns the query into set oriented. Would that be acceptable?

Posted by Alon Blich on 19-Aug-2006 15:12

To be sure, it isn't unimaginable, but it might require a bit of twisiting. E.g.,

suppose the magic to get some of those results was to use PRESELECT, which

essentially turns the query into set oriented. Would that be acceptable?

IMHO there's no need to take everything apart, start over or speak in those terms, not at all.

It can even be a 4GL utility that takes in SQL like statements string and translates it into a dynamic query.

I've worked on the subject in the past and there's so much more to say.

I'm planning on posting a more thorough article and code to further the subject later on this year.

Posted by Thomas Mercer-Hursh on 19-Aug-2006 15:26

I think we are talking about two different things.

I agree that a SQL connection object shouldn't really be that difficult and, once done, all it needs is a set of objects to feed it queries.

In terms of adding new functionality to ABL statements themselves, though, I think it is a little trickier. A no index scan is probably just another keyword. An index-only scan is probably also another keyword. But, using the query optimizer might require forcing a set-oriented query ... but I'm not sure if adding that functionality to PRESELECT would be acceptable or not.

I don't see why no index scans, other types of scans and optimizing

queries in one way or another can't be done in ABL.

The more concrete you can get, the better the chance you have of getting it. Make sure to include the impact. It would be particularly interesting if there were telling comparisons between SQL and ABL for some reasonable queries.

Posted by Alon Blich on 19-Aug-2006 16:52

... But, using the query optimizer might require forcing a set-oriented query

Reports, usually, have dynamic criteria so they'll make a good example. Take for example a 4GL report with the following tables joined in that order.

FOR EACH order NO-LOCK, EACH orderline OF order NO-LOCK, EACH item OF orderline NO-LOCK:

If we were filtering by order that would be the optimal join order. But if the filter was by itemnum the optimal join order would be

FOR EACH item NO-LOCK, EACH orderline OF item NO-LOCK, EACH order OF orderline NO-LOCK:

4GL queries are relatively simple, all you can/have to play with is the join order and indices.

That's the direction I'm thinking it should be. SQL like statement string with FROM buffers WHERE clause with joins and criteria etc.

Optimizer decides on join order and indices (execution plan) that ends up as a dynamic query and FOR EACH statement etc.

Even the compile-time rule-based optimizer makes very basic (if none at all) of boolean algebra which makes him very unforgiving.

Here's simplified example that i've recently seen on million plus record table that resulted in an index bracket of hundreds of thousands of records instead of a couple hundred.

FOR EACH order

WHERE (custnum = 1

OR custnum = 2

OR custnum = 3

OR custnum = 4

OR custnum = 5)

AND orderdate >= 1/1/1990

NO-LOCK:

Instead of taking multiple index brackets on the custnum index the compiler choose the orderdate index.

And likewise similar side effects, like, redundant index scans that can be shared etc.

But like I said there's a lot more to say Would you be willing to review it ?

Posted by Alon Blich on 19-Aug-2006 16:56

Wow, did the server just die took me 20 minutes to login

... But, using the query optimizer might require forcing a set-oriented query

Reports, usually, have dynamic criteria so they'll make a good example.

Take for example a 4GL report with the following tables joined in that order.

FOR EACH order NO-LOCK, EACH orderline OF order NO-LOCK, EACH item OF orderline NO-LOCK:

If we were filtering by order that would be the optimal join order.

But if the filter was by itemnum the optimal join order would be

FOR EACH item NO-LOCK, EACH orderline OF item NO-LOCK, EACH order OF orderline NO-LOCK:

4GL queries are relatively simple, all you can/have to play with is the join order and indices.

That's the direction I'm thinking it should be. SQL like statement string with FROM buffers WHERE clause with joins and criteria etc.

Optimizer decides on join order and indices/execution plan

that ends up as a dynamic query and FOR EACH statement etc.

Even the compile-time rule-based optimizer makes very basic use (if none at all) of boolean algebra that makes him very unforgiving for someone who isn't aware.

Here's simplified example that i've recently seen on million plus record table that resulted in an index bracket of hundreds of thousands of records instead of a couple hundred.

FOR EACH order

WHERE (custnum = 1

OR custnum = 2

OR custnum = 3

OR custnum = 4

OR custnum = 5)

AND orderdate >= 1/1/1990

NO-LOCK:

Instead of taking multiple index brackets on the custnum index the compiler choose the orderdate index.

And likewise similar side effects, like, redundant index scans that can be shared etc.

But like I said there's a lot more to say Would you be willing to review it ?

Posted by Thomas Mercer-Hursh on 19-Aug-2006 18:12

Would you be willing to review it ?

Sure, but better yet would be to share it here and on the PEG to get as many inputs as possible.

Posted by svi on 24-Aug-2006 14:52

I'd go farther than that - PSC needs to be more open about how it makes its decisions with respect to the language's future. From what I've seen so far, these decisions appear to be based more on which wheel's making the most noise as much as it is about technical innovation and making the language consistent across platforms (witness the ongoing lack of a dynamic ChUI browse).

Off the top of my head, in OpenEdge 10, the language has been enhanced to include:

OpenEdge 10.0x, 10.1a:

- Object-orientation

- ProDataSets, including batching, Open Client support, before-image mgmt, READ-, WRITE-XML() methods

- XML (DOM and SAX)

- Web services (consuming and producing)

- Language support for .NET Open Client

- OpenAPI

- BY-REFERENCE and BIND

- 64bit r-code

- BLOB, CLOB data types

- LOG-MANAGER

- Linguistic sorting

- Cryptography

- Language support for Auditing (application level)

10.1B, and some WIP and under consideration:

- INT64 data type

- More object-orientation (eg. overloading, properties, RETURN ERROR)

- Indeterminate extents

- Language support for New User Interface project - including data binding

- Strongly typed event-handling

- New error and exception handlers

I don't think these shall be qualified as squeaky wheel" stuff. Of course the to-do list is always long and time and resources never unlimited!

Regarding Dynamic ChUI Browse: As I said on an earlier thread, working on it. I'll keep you informed. The lists are always long with major projects underway.

Posted by svi on 24-Aug-2006 15:05

BREAK BY

Interesting discussion about queries, ProDataSEts, BREAK BY.... To make sure we understand actual uses of BREAK-BY in your applications, with let's say ProDataSets, could you include some use cases please?

CAN-FIND

We are looking at dynamic CAN-FIND. Because its dependencies, you may hear us refer to it as support for External Buffers.

Posted by Tim Kuehn on 24-Aug-2006 15:33

I don't think these shall be qualified as squeaky wheel" stuff.

Since I don't have knowlege of PSC's internal decision-making process, data, etc. I can only go by your comments at Exchange and in the Dynamic ChUI thread which seemed to indicate otherwise.

Of course the to-do list is always long and time and resources never unlimited!

I appreciate that, which is why I'm asking what process PSC uses to establish what gets done next and how those resources are allocated. I was going to start another thread, but I think it would be better for whoever can authoritatively speak to this question to do.

Posted by Alon Blich on 26-Aug-2006 08:19

BREAK BY

There's no sum( ) or count( ) in for each.

Mostly for working with aggregates, summary lines in reports etc. in forward-only queries, mostly.

It's just that the query or part of it, like, the where clause is dynamic.

The fix/es in the KB, or at least what I understood it meant is

There's a tradeoff that you can only tell if the line is last-of something after you've past it.

But the biggest problem is that it's not as obvious to everyone, far from it.

And there's lot's of ways to get to the same result that can go very wrong.

It'd be great if there was query handle:first-of( ) or something like that.

It'd be easier for all of us if dynamic and static queries were as much a like.

Posted by Tim Kuehn on 26-Aug-2006 09:58

It'd be great if there was query handle:first-of( ) or something like that.

Check my query manager - it can do first-of / last-of for a set of fields from a set of query buffers, so a query doesn't have to be forward-only to get that functionality.

If PSC implements this, then it should work in -either- direction one's scrolling, although the first-of / last-of should only be true for "forward" scrolling direction even if you're doing GET-PREV().

Posted by Alon Blich on 26-Aug-2006 12:33

Tim,

Again, I think, the query manager is an interesting project.

But it's an example to what I said that there are many ways to get to the same result that are problematic.

The following code snippet with the query manager takes nearly a second.

Previous loop suggested or for each break by takes less then a millisecond.

There are 83 records in the customer table.

Most tables that I need to do aggregate functions on could have hundred of thousands or millions of records.

Another thing you could use, that I found has a smaller tradeoff and is relatively simple.

Is define the query with a cache and do a get next, get prev before every iteration to get the values of the next row.

IMO, there are many good reasons to support break by.

And it's not to say it cannot be done or used even with scrollable queries.

Like you showed with the query manager's browse break by example.

If for no other reason but to make working with forward-only, regular and scrolling queries the same.

But if break by in queries did force forward-only I still think it would be a fantastic feature to have.

With a for each break by the query is always one row ahead.

But with queries, reposition would probably force reading the prev and next rows.

And going backwards with get prev would require changing the direction to reading the prev row ahead of current row.

Starting to get messy and it's probably not all

Posted by Tim Kuehn on 26-Aug-2006 13:34

The following code snippet with the query manager takes nearly a second.

Previous loop suggested or for each break by takes less then a millisecond.

There are 83 records in the customer table.

"hardwired" solutions will always beat fully-dynamic solutions for performance any day. I won't quibble with that.

If the language had native BREAK BY support, I would expect faster time from an ABL-implemented solution as well.

Also, the initial version of the QM was written to check the break conditions on 1 or two field sets at a time - if support for more levels of "Break by" was needed, some way to check all the break condtions "at once" would need to be coded so the program doesn't wind up doing the GET NEXT/PREV and all the checks involved for every distinct field combination.

I can conceptualize a way to make the QM's break-test faster, but that assumes that the QM does all the query navigation operations so it could keep a set of dynamic TT's in sync with the current result list.

What I have not seen in the ABL, and don't expect to see, is to have first-of/last-of that's distinct from the BY specification. The QM can do that. I'm not sure if it's useful or not, but it can do it.

Posted by Alon Blich on 26-Aug-2006 13:52

It came out the wrong way, it's not like I've written anything better.

But I do and have used the procedure manager

Posted by Tim Kuehn on 26-Aug-2006 14:11

It's not a big deal - I'm used to these kinds of exchanges.

I'm glad you found the PM useful. Maybe you could start another thread and tell us how it's helped you out. I know it's made my work a whole lot easier than what I used to do.

Posted by ChUIMonster on 22-Sep-2006 07:29

The usual "explanation" is "no one would use it."

I would.

Posted by Alon Blich on 22-Sep-2006 08:36

The way I see it, is that for every record a user creates he reads 10,000 (realistically it can be 10 times as much).

But when Progress talks about performance its always transactions, transactions, transactions, it's never queries ?

Maybe I don't know what I'm talking about or it's an NDA thing and there may be perfectly good reasons but no one will talk about it ?

And if it's working with set's how about adding RETURN to for each statements, then we could have subqueries, group by, set operations like unions ...

wouldn't this be cool

(for each orderline

group by itemnum

return itemnum)

union

(for each poline

group by itemnum

return itemnum)

And I'm sure there are many more things that can be at least explored. But AFAIK no work has been done with queries since multiple indexes in V7.

How come ... ?

Posted by ChUIMonster on 22-Sep-2006 09:49

The way I see it, is that for every record a user creates he reads 10,000

(realistically it can be 10 times as much).

But when Progress talks about performance its always transactions,

transactions, transactions, it's never queries ?

I agree completely. Read performance is far more important to most people most of the time. And it is a huge driver of end-user unhappiness in badly tuned systems. That's why I wrote readprobe

And it is why I think that the 4GL needs access to cost-based optimizer features such as no-index table scans.

Posted by Tim Kuehn on 24-Sep-2006 20:59

The usual "explanation" is "no one would use it."

I would.

If it was part of the compiler logic, the default could be to use it whenever a table scan was required. Then everyone would be "using it" even if not intentionally.

This thread is closed