Cost-based query plans for ABL

Posted by dbeavon on 08-Feb-2016 15:21

Does anyone know how best to leverage the cost-based query plans in ABL (ie using selectivity statistics)?  I know its a big question but OE has cost-based query plans on the SQL92 side of things and it would be wonderful to take advantage of them in a regular ABL program too.

The problem with the ABL record loops (ie "FOR EACH") is that not only do you have to know which tables to join, you have to manually specify what order to join them and even what index to use (assuming there are more than a few).

We'd like to give the users the ability to use multiple ad-hoc filter criteria on two different tables, whereby sometimes it makes sense to run the outer loop on table 1 and sometimes it makes sense to run the outer loop on table 2.  SQL92 should know what to do based on regularly maintained statistics.

I'm not finding the answer to this question.  While there are plenty of FAQ's on OE sql92.  Nothing brings together the SQL92 world with the ABL world.

https://community.progress.com/community_groups/openedge_general/w/openedgegeneral/1613.bi4p-sql-92-frequently-asked-questions

I remember back in the day you could use inline SQL89 in ABL but I think that stuff is all deprecated by now.   My only idea at this point is to build a SQL query by hand and shell out to "sql92" to run it and generate some sort of result set of primary keys from the two tables, which ABL would at last be able to deal with competently.  Not pretty....

All Replies

Posted by dbeavon on 08-Feb-2016 15:22

Typo... I meant shell out to "sqlexp".

Posted by Thomas Mercer-Hursh on 08-Feb-2016 15:36

Not the direction you are thinking, but you might want to look at

pugchallenge.org/downloads2011.html and find Alon Blich's presentation on the Query Optimizer

I think the problem is a lot bigger than the difference between compile time optimization and run time optimization based on statistics.  It is quite possible to write really awful queries in either one.  So, I think the problem is either providing the user with a guided builder which will result in a useful query or checking the user's request to optimize (or reject) it.

Posted by brynjar on 09-Feb-2016 15:39

Don't know if it's the best way and it's unfortunately not cost-based but we've put quite a few features into our query service. The programmer defines the query and optional sub-queries and/or rules for when to switch buffer-sequence.

• Calculated fields (with optional runtime parameters)
• Filter on calculated fields
• Filter on joined buffers (subselect and not exists)
• Ability to automatically change buffer sequence 
• Filter on external buffers (requires sub-query definition)
• Multiple joins to same table
• Sorting on calculated fields
• Query count, count distinct
• Accumulation of totals and subtotals
• Support for duplicate field names
• Support for extent-fields
• Validation of query against schema security
• Returns temp-table handle pluss additional context info (read/write restrictions)
• Paging
• Sub-selects (pre-scan)
• All records (in query) are processed when
• Sorting on calculated values
• Querying distinct values
• Grand totals involve calculated fields
• All records are traversed when
• Doing record count or grand totals

Posted by Thomas Mercer-Hursh on 09-Feb-2016 15:48

One might also note that, while the idea of a completely open query environment may see attractive, in practice users are likely to actually need a more predictable subset.  By providing a number of filter options and appropriate logic, one can fill a temp-table efficiently and then sort, select, combine, and compute like crazy without the disk fetch overhead.  This takes some thinking and research and openness to extend, but can be highly efficient and performant with good control over formatting.

Posted by dbeavon on 09-Feb-2016 18:55

So I guess all I was looking for was a way that an ABL programmer can leverage the more recent OE technology on the SQL92 side of things.  That would complement our legacy ABL tooling, and leave a lot less query building work for the developer (put more of it in the hands of the DBMS).  For example, even basic things in ABL leave a lot to be desired - like theway in which it has always picked indexes (when there are more than a couple).  These days most DBMS products can find a better query plan than a developer can - at least in the vast majority of cases, with hints available in the rare cases where the cost-based optimizer gets confused.

What I was going for was some kind of pass-thru where we could define a result set based on a SELECT query and then process that result set in ABL after it was generated from SQL92.  It would be something along the lines of QUERY-PREPARE but would leverage the more advanced query optimizer in SQL92.

     /* Send to SQL92 Please! */

     v_QueryHandle:QUERY-PREPARE("SELECT FOR EACH Table1, Table2, Table3 WHERE ...")

     /* ****************************************************************** */

     /* Open and start non-indexed filtering                               */

     /* ****************************************************************** */

     v_QueryHandle:QUERY-OPEN().

     REPEAT ON ERROR UNDO, THROW:

        /* Loop */

        v_QueryHandle:GET-NEXT().

        IF v_QueryHandle:QUERY-OFF-END THEN LEAVE.

        /* DO MY ABL WORK HERE! */

Anyway, I suppose the shelling to "sqlexp" is an option, albeit a very dirty one.  It seems like OE should consider the benefits of combining their two totally separate worlds of ABL and SQL92.

Posted by Marian Edu on 10-Feb-2016 02:19

you do realise that cost base optimiser isn't the holy grail, it's a fact for SQL because there is no such a thing as 'static compile-time queries' but there is no rdbms engine in the world that will perform well on any free-form user query. beside the optimiser need a lot of DBA attention, with no update statistics it's pretty much useless and hints are not really exceptions, at least when it comes of writing complex queries the developer is capable of doing a much better job that the optimiser... talking about a good one of course ;)

shelling out to sqlexp means you'll have to have the sql engine running, need user with rights granted, run update statistics periodically, 4gl triggers won't fire, sql-width issue might bite you from time to time then you have to find a way to catch the result from stdout or pipe it through a file in a format you can use from within 4gl and finally have that result set loaded in a temp-table - it's structure most probably need to be dynamically created based on user's query or result set output... not to mention a connection is established for each request which will give you some nice latency to enjoy.

now, about Alon's query optimiser we did had some interesting discussions at the time and have to say I'm sorry things didn't worked out but there are still some things that can be done for dynamic queries... having a 4GL version of the cost base optimiser can start from 'update statistics', being able to change the query join order based on number of records that exists in each table could be a first step, then make that more complicated by using 'query plan' cache to reduce the time spent on re-engineering a query string that is 'similar' to one that was already executed.

however, this is not an easy job and I fear there is really no market for something like that... vast majority in Progress world expect everything to come from PSC so guess we just have to wait a bit longer :)


Marian Edu
------------------
www.acorn.ro
www.akera.io

Posted by brynjar on 10-Feb-2016 02:29

Looks good - if you add it as a feature request I'll vote! Maybe we'd had a better chance if the query string was sql..

Posted by Alon Blich on 10-Feb-2016 04:19

Hello All,

In my humble opion, a query optimizer is the holy grail of database applications in terms of performance improvement and maybe as a challenge :)

I think, in database applications no other way of improving performance comes close e.g. 32bit and 64bit optimization, multi-threading, transaction per seond etc. but maybe it's just me :)

I read a book about query optimization many years ago that started this way, more or less -

In database applications what takes the most amount of time by far are database operations and what takes the most amount of time out of that by far are read operations, and the way to improve that is a query optimizer.

For example, let's say we had a report with the following query -

for each Order, each OrderLine of Order, each Item of OrderLine.

If you had a filter for itemname = "test" and use the same join order -

for each Order, each OrderLine of Order, each Item of OrderLine where Item.itemname = "test".

then you would run over all the records in the query which is the worst case possible. If the tables had millions of records that could take hours (although this join order would be ideal for filtering by ordernum = "<x>").

But if you changed the join order (which you can with inner joins) to -

for each Item where item.itemname = "test", each OrderLine of Item, each Order of OrderLine.

then you would only run over the records you requested which is the best possible or ideal case. In this case a few tens of records that will take a few milliseconds to fetch. That's the difference of a program running hours or milliseconds.

You cannot have a static, fixed way of running dynamic queries (execution plan) for every condition, or filter. Which in most cases means changing the join order and indexes.

I did write a 4gl query optimizer that is 90% complete but I eventually stopped and published the code on the oehive.org (included in the standard libraries project) because ultimately there is no market for it and the interest that I got.

I also suggested using b-tree indexes key distributions instead of statistics along time ago which in theory would not require collecting statistics and could porbably be accurate enough for measuring the query's progress.

BTW hopefully, I will be presenting an open source framework project at the north amercian conference, if my proposal is accepted, that has a 4gl reporting frameworking and query optimization that can change the join order and indexes according to the filter.

Posted by dbeavon on 10-Feb-2016 08:27

The market *is* there for OE-SQL92, and the technology is already pretty good, having matured a lot since the 9.x days. A lot of people use it for reporting applications.  Unfortunately the technology is not not easily accessible from the ABL side of things, which seems strange to me.  It sounds to me that some of the problem might be a mistrust of a DBMS to do the query optimization work of a "hardcore developer".  There shouldn't be so much anxiety on this point.  As I said *many* DBMS products, OE-SQL92 included, are capable of building high-quality query plans from cost-based optimization.

I'm still leaning towards an approach that shells out to "sqlexp" and builds a result keyset for ABL to consume.  That maybe the best option at this point, especially for those of us who already have a well-established OE-SQL92 infrastructure.  As ugly as it sounds, it seems better than a home-grown (roll-your-own) query optimization engine...  (but props to those who attempt it...)

Posted by Brian K. Maher on 10-Feb-2016 08:46

David,
 
What you are missing is that the SQL92 engine is a completely separate thing (3rd party product that we bought) that was grafted onto the database for SQL access.  We did not enhance the database and decide to do those enhancements only for SQL clients.  What you are asking for is one of those things that is easy to say but very difficult to implement.  If you feel strongly about this then you should submit an enhancement request to the Ideas section of Communities.
 
Brian

Posted by dbeavon on 10-Feb-2016 09:24

Hopefully this makes sense:

community.progress.com/.../make_use_of_sql92_query_optimizer_from_an_abl_program

I understand its not easy, but it makes more sense than forcing line-of-business ABL programmers to continually write our own home-grown solutions to this problem.

Posted by gus on 11-Feb-2016 12:26

you are asking for the wrong thing.

SQL queries and 4GL queries are very different beasts.

you should ask for a cost-based query optimizer designed specifically for 4GL queries (if that is indeed what you want).

Posted by dbeavon on 11-Feb-2016 13:14

I don't have much hope for that.  SQL92 is already available and the SQL language is a more standardized approach for declaring result sets that can benefit from advanced cost-based optimization techniques.  No point reinventing this wheel in ABL.

Thanks for the feedback.  Please vote!

PS. On the 4GL side I can't even use a "CAN-FIND" condition in my QUERY-PREPARE statements.  So much for "dynamic" queries on the ABL side of things.  Probably no good reason for this restriction either.  The error is:

***CAN-FIND is invalid within an OPEN QUERY. (3541) ***

Posted by gus on 11-Feb-2016 13:24

> On Feb 11, 2016, at 2:15 PM, dbeavon wrote:

>

> SQL92 is already available and the SQL language is a more standardized approach for declaring result sets that can benefit from advanced cost-based optimization techniques. No point reinventing this wheel in ABL.

this is exactly what /must/ be reinvented if you want cost-based query optimization in the 4GL.

there is almost no overlap. query optimizers are very much like optimizers in language compilers. there are a few general principles or common techniques that can be used to get maybe 20 % and then the remaining 80% is a bunch of special cases that are specific to the details of the query expression and the data types involved.

Posted by brynjar on 11-Feb-2016 13:48

A good place to start extending the 4GL would be to provide direct access to each buffers query criteria. It would make switching the buffer sequence or building sub-queries a lot easier.  

Posted by Peter Judge on 11-Feb-2016 13:56

Brynjar,
 
How do you mean? Something like being able to say
hQuery:get-buffer-handle(i):where-string
?
 
 

Posted by Alon Blich on 11-Feb-2016 14:02

changing the join order and indexes might be 20% of the possible optimizations but they account for 80% or more of the cases.

there is alot more to changing the buffers join order and also finding all the possible index brackets.

when changing the join order the conditions need to be divided differently between the joins and preferably pushed up. this might be simple if there are only and conditions but is far more complicated with complicated or and and conditions.

and to discover all the different index brackets the where condition might need to be refactored (and initially simplifed).

i also believe that rule based optimizer can be very useful but the join order and indexes cannot stay fixed for every condition.

imho no other performance improvement comes close.

Posted by brynjar on 11-Feb-2016 14:04

You got that right!

Posted by Marian Edu on 11-Feb-2016 14:11

That might be an idea Peter, what will be nice is to have the query-prepare handle a 'global' filter and break that into where clauses for each buffers... do realise sometime this is not easy or maybe even not possible if complex and/or logic is used with criteria on different buffers grouped together :(

for each a, each b where a.x = 12 and b.y = a.z

=>

for each a where a.x = 12, each b where b.y = a.z

Posted by Peter Judge on 11-Feb-2016 14:18

I was trying to understand the request (wasn't trying to provide a solution) and what information would be useful.
 

Posted by Alon Blich on 11-Feb-2016 14:20

breaking up the where clause, refactoring, simplifying, etc. are boolean algebra.

the rules are clear and are not new. the challenge is the complexity

but is it worth the effort? :)

Posted by brynjar on 11-Feb-2016 14:45

Of course - if somebody else is doing it!

I think of the appserver as an extension of the database. In a normal crud app it should be able to do the majority of the work only based on client requests only - with with some extra guidance for the more complex stuff. Once in a while you might even have to write a bl procedure :)

Posted by gus on 11-Feb-2016 15:08

> On Feb 11, 2016, at 3:20 PM, Alon Blich wrote:

>

> breaking up the where clause, refactoring, simplifying, etc. are boolean algebra.

>

> the rules are clear and are not new. the challenge is the complexity

>

> but is it worth the effort? :)

yes, absolutely.

Posted by Alon Blich on 12-Feb-2016 01:20

> yes, absolutely.

as a challenge definitely!  but not fiscally :)

from my experience with docxfactory, it would be hard for me to justify a progress only project.

i think it would be very helpful if progress supported in one way or another innovations from the community.

Posted by Marian Edu on 12-Feb-2016 01:48

don't know what have you been smoking lately Alon but I sure want some of that... nice friday joke though :)

Posted by Alon Blich on 12-Feb-2016 01:52

what was i thinking :)

Posted by Marian Edu on 12-Feb-2016 01:54

hehehe, definitively a nice try... wish you best of luck

Posted by Alon Blich on 12-Feb-2016 02:00

thanks! you too :)

Posted by Marian Edu on 12-Feb-2016 02:13

now I only have to blame you for getting too excited to write a post about the ERD thingy :)

back to the cost base optimiser I can even see a new start-up parameter in place for backward compatibility... `-no-cbo-please` so we can keep on using compile time index selection (query plan) and hunt down those whole-index pests

damn, good thing it's friday

This thread is closed