Wherefore art thou NO-INDEX?

Posted by Tim Kuehn on 28-Dec-2006 15:00

Recently Gus posted a list of advantages to Type II areas on the PEG (http://www.peg.com/lists/dba/web/msg23180.html).

Among the list of advantages was:

Table scans

Because the clusters of a data object are linked together, it is possible to read an entire table without using an index. This is useful for a variety of operations and will be exploited more in the future. In OpenEdge 10.0A and later, the SQL query processor makes use of table scans when the optimiser decides it is worthwhile.

First, WHY is the SQL language getting this functionality before the ABL does?

Second, when is the ABL going to get this no-index capability as well?

All Replies

Posted by Alon Blich on 28-Dec-2006 15:28

I hope so. Tom Bascom will shortly rant about this, and he should.

Gus,

Please, please, don't take this the wrong way.

The other day I was sitting watching an interview with a leading government minister. And the minister went on to say that the government is wrong about this and should not be doing that ... I had a feeling someone should tap him on the shoulder and tell him he is the government.

Gus, you're in charge, right ? you're the man.

Posted by Thomas Mercer-Hursh on 28-Dec-2006 15:41

In this case, Gus is not in charge. He is not in the appropriate reporting hierarchy, so he has to lobby to get things done. Anyone who doesn't pay a lot of attention to what he says is an idiot, but that doesn't mean that he is in control of this area.

Posted by Alon Blich on 28-Dec-2006 16:03

Gus is usually on our side over the company's but he's a top executive/techie.

Well, I thought it was ironic.

Posted by ChUIMonster on 28-Dec-2006 16:05

Type II storage areas were released with 10.0A on December 4 2003. The SQL engine has been able to do no-index scans for 3 years now.

The 4GL team has known that this feature was coming since at least early 2003. It was discussed at Exchange 2003 and suggestions made for the 4GL syntax to support it at the 4GL info exchange that year. In 2004 it was again discussed at the 4GL info exchange. The audience heartily endorsed the need. In 2005 it was discussed at the 4GL info exchange. The presenter seemed surprised and confused that anyone would want such a thing. Gus was there and explained it in some detail and with a certain amount of vigor to the 4GL marketing team. The audience again strongly supported the need.

I'm told that it was brought up again last year. I'm also told that the audience was rather subdued and discouraged -- probably from banging their heads against the wall for so many years -- I don't know, I wasn't there; someone cleverly scheduled a conflict for me

I think that it is long past time that this feature was taken seriously and implemented.

Posted by Tim Kuehn on 28-Dec-2006 16:14

I'm told that it was brought up again last year. I'm also told that the audience was rather subdued and discouraged -- probably from banging their heads against the wall for so many years -- I don't know, I wasn't there; someone cleverly scheduled a conflict for me

I was there, and bright-eyed Exchange newbie that I was I probably made the most noise about the things I thought would make a good addition to the language. As I recall, the response that the NO-INDEX request got from PSC was, well, not exactly encouraging.

Posted by Admin on 29-Dec-2006 01:59

First, WHY is the SQL language getting this

functionality before the ABL does?

I think one of the fundamental differences between ABL and SQL is the location where the query is handled. The ABL basically is a database manager on it's own. Five years ago or so I examined the network traffic from 4GL-client to database server. The DELETE-handling was kind of an eye opener to me: the network traffic learned me that the client is deleting index-values one by one as well as the record buffer. So the 4GL-client is in control.

Now you can imagine that changing this 4GL-client means changing the data access protocol. And that would destabilize existing applications (any change will, right)? There are several old requests in ERS that discuss the chatty interface between 4GL client and database server, which is noticable on a bad network connection.

The SQL-implementation is different, since the SQL-client is a lightweight client. The actual query processing is done in the SQL-engine on the server, close to the database.

Posted by Alon Blich on 29-Dec-2006 05:58

discuss the chatty interface between 4GL client and database server

Absolutely ! it is one the biggest past design flaws in network connections IMVHO it is worse then RAID5.

Most noticeably is in the case of multi-table queries. The 4GL client/agent breaks down multi-table queries into single-table queries which are the only type of queries the server can handle. So every iteration, for every join would at the very least require a single network round-trip.

For example -

FOR EACH orderline NO-LOCK, FIRST item OF orderline NO-LOCK.

The item join would require a single round-trip for every record in the orderline table, that on average can take 1 millisecond. Multi-table queries with hundreds of thusands of rows and unique joins are an every day thing.

Here's an example from PEG, although, there's a few more examples of happy users I found @ProgressTalk -

http://www.peg.com/lists/peg/history/200408/msg01138.html

In query 1 the necessary time is appr 100 ms. In query 2 then time 2 s (20x

situation1). Adding another join takes 2 seconds more.

Can anyone explain to me why query 2 takes 20x more time while only twice

the number of records of query 1 are retrieved? Is there any possibility to

tune this behaviour?

And the Knowledge Base -

KB-18342: How to improve Client Server Performance in Progress

The "cost" of roundtrip can be one 1 millisecond. The records can be

retrieved from disk much faster.

It even goes further to recommend against using sorted queries with a network connection, which sounds to me like a pretty serious handicap and even ridiculous.

At least in the case of all NO-LOCK, read-only queries it is essential that a multi-table query is performed on the server side, in a network connection.

This type of performance problem is something that's, in my experience, pretty common and worse there are some very problematic and strange attempts to deal with the problem. Because it's mostly unknown, and maybe even worse, it's unexpected.

Another implication is if you ever plan on running heavy queries it pretty much limit's your deployment options to having the client/agent on the same machine as the database server and a self-serving connection. IMHO all those deployment figures they talk about in the documentation are just unrealistic.

I'm sorry to say but even if we finally get NO-INDEX it will only be a small step in the right direction and only one in a long list of common, real world needed features. A much bigger, fundamental change is required.

I think the bigger issue is that the 4GL has outsourced, so to speak, the OLAP related features to SQL.

It always seems to me that when ever the guys at Progress design something, like, ADM, Dynamics and now AutoEdge, and don't get me wrong those are all extremely impressive pieces of code, they're only designing for half an application. Reporting and OLAP never get the attention they deserve and they're just as big or maybe even a bigger part of the application.

When it comes to 4GL almost all we hear are maintenance applications and transactions, transactions, transactions, which I don't believe is or should be the case.

Posted by Admin on 29-Dec-2006 06:29

Most noticeably is in the case of multi-table

queries. The 4GL client/agent breaks down multi-table

queries into single-table queries which are the only

type of queries the server can handle.

Or maybe it's the client that has to deal with whole row buffers and can't deal with joined queries. There is a lot of buffer-management going on n the client: is a row dirty, is it locked, etc. SQL-clients are a lot more relaxed about their resultsets.

There are some optimization hints you can pass on to 4GL statements when dealing with DataServer databases. So there is some flexibility in the client (somewhere).

It even goes even further to recommend against using

sorted queries with a network connection, which

sounds to me like a pretty serious handicap and even

ridiculous.

I'm sorry to say but even if we finally get NO-INDEX

it will only be a small step in the right direction

and only one in a long list of common, real world

needed features. A much bigger, fundamental change is

required.

Exactly!

When it comes to 4GL almost all we hear are

maintenance applications and transactions,

transactions, transactions, which I don't believe is

or should be the case.

I think the dilemma here is uncertainty: you either invest a lot of money in the 4GL, potentially destabilizing the installed codebase, or you bet your money on another horse. Don't underestimate the influance of Progress customers that start complaining when a feature breaks in a new ABL-release. Maybe one of the other problems is the number of supported deployment platforms. It must be a nightmare to deliver a new version of the product with all the variables (platform version, product version, product patch). Being able to run the client/database/appserver on *nix has always been one of the strengths of Progress. But freedom of choice has it's limit (I know, I'm from the Netherlands

Posted by Alon Blich on 29-Dec-2006 07:05

Or maybe it's the client that has to deal with whole row buffers and can't deal

with joined queries. There is a lot of buffer-management going on n the client:

is a row dirty, is it locked, etc. SQL-clients are a lot more relaxed about their

resultsets

I need to check the KB but, I think, in queries multiple records are sent in a message buffer. Why not multiple records from different tables.

IMHO it would solve a big part of the network connection problem, it would make those deployment options they talk about realistic and would even make remote connection over a WAN workable, although, I doubt anyone will use it.

There are some optimization hints you can pass on to 4GL statements when

dealing with DataServer databases. So there is some flexibility in the client

(somewhere).

You could also send a SQL statement to a stored procedure and return the resultset into a temp-table, with a DataServer.

I think the dilemma here is uncertainty: you either invest a lot of money in the

4GL, potentially destabilizing the installed codebase, or you bet your money on

another horse. Don't underestimate the influance of Progress customers that

start complaining when a feature breaks in a new ABL-release. Maybe one of the

other problems is the number of supported deployment platforms. It must be a

nightmare to deliver a new version of the product with all the variables (platform

version, product version, product patch). Being able to run the

client/database/appserver on *nix has always been one of the strengths of \

Progress. But freedom of choice has it's limit (I know, I'm from the Netherlands

But it will improve performance

I don't think it will or should change query behavior in anyway. And incase it does change in someway a startup parameter could be used, similar to single index V6 queries.

IMVHO it's that Reporting and OLAP features just haven't gotten the attention they deserve in 4GL, their frameworks, reference implementations etc. I would say they've been ignored to a degree.

Other alternatives, like, adding access to the SQL engine from 4GL have been raised. At least we'll be able to join the fun everytime theses new features come out.

I have to visit the Netherlands one day

Posted by Tim Kuehn on 29-Dec-2006 09:20

I think the dilemma here is uncertainty: you either invest a lot of money in the 4GL, potentially destabilizing the installed codebase, or you bet your money on another horse. Don't underestimate the influance of Progress customers that start complaining when a feature breaks in a new ABL-release.

You should check the 10.1B release notes review I posted earlier - it specifically states a number of things 10.1B breaks wrt prior versions. So, while PSC is very very good about backward compatibility, they're not above breaking things either.

Maybe one of the other problems is the number of supported deployment platforms. It must be a nightmare to deliver a new version of the product with all the variables (platform version, product version, product patch). Being able to run the client/database/appserver on *nix has always been one of the strengths of Progress. But freedom of choice has it's limit (I know, I'm from the Netherlands ;-)

I imagine with enough capable resources that it's possible to effectively do this. I image most of the platform-specific-isms are isolated to library API's while the core code is the same for all the platforms.

Posted by Tim Kuehn on 29-Dec-2006 09:24

IMHO it would solve a big part of the network connection problem, it would make those deployment options they talk about realistic and would even make remote connection over a WAN workable, although, I doubt anyone will use it.

This is what Appserver's for. Fat Client/Server's so not the way to go.

You could also send a SQL statement to a stored procedure and return the resultset into a temp-table, with a DataServer.

Again, this is what Appserver's for. No need to do the stored-procedure thing.

Posted by Alon Blich on 29-Dec-2006 10:26

This is what Appserver's for. Fat Client/Server's so not the way to go.

of course, everyone here is aware of that. but it has nothing to do with what we were saying.

in regards to the appserver it would mean for performance reasons the appserver and database would need to be on the same machine.

Again, this is what Appserver's for. No need to do the stored-procedure thing.

nope. what we meant is that sql (the progress sql engine and in general) is much better equiped for these high end read operations and has all those common, widely used performance features, like, of course a query optimizer, full table scans (no-index), index only queries, merge and hash joins, index skip scans etc. etc. etc. and poweful query features, like, sub queries, unions etc. etc. etc.

and additionally doesn't have these handicaps, for example, the 4gl implementation of network connections.

Posted by ChUIMonster on 29-Dec-2006 10:38

None the less NO-INDEX is a feature that the 4GL engine should support.

The persons who would need to add that support do not seem to think it difficult to add. They just need Sal to say "do it".

Your various points about query execution should perhaps be split off into a thread of their own. It is an interesting topic and worthy of independent discussion.

Posted by Alon Blich on 29-Dec-2006 10:50

Just to be clear, I'm very much for Tom Bascom's idea for NO-INDEX. Just a few days ago there was a sample use case here in the DBA forum.

It would be really nice to get some responses, good or bad, from the Progress guys. At least acknowledge we exist

Posted by Tim Kuehn on 29-Dec-2006 11:05

Just to be clear, I'm very much for Tom Bascom's idea

for NO-INDEX. Just a few days ago there was a sample

here in the DBA forum.

This has come from a number of people, not just Tom.

It would be really nice to get some responses, good

or bad, from the Progress guys. At least acknowledge

we exist

This topic was discussed last August, and at least one PSC person weighed in on this issue. They may not want to comment publicly and get into what - for them - may be a no-win situation.

This is one of the reasons I think PSC needs a liason that the external developer can authoritatively talk to about issues like this.

Posted by Tim Kuehn on 29-Dec-2006 11:46

Just to be clear, I'm very much for Tom Bascom's idea

for NO-INDEX. Just a few days ago there was a sample

here in the DBA forum.

This has come from a number of people, not just Tom.

It would be really nice to get some responses, good

or bad, from the Progress guys. At least acknowledge

we exist

This topic was discussed last August, and at least one person from PSC weighed in on this issue.

That this - and other - topics have come up before, and we seem to be getting the same requests for 'use cases' while not recalling the prior discussion is one of the reasons I think PSC needs a liason that the external developer can authoritatively talk to about issues like this.

Posted by Alon Blich on 29-Dec-2006 12:38

I think PSC needs a liason that the external developer can authoritatively talk to

about issues like this.

That would be great ! though, I can't imagine anyone volunteering for this suicide mission Sadly, I think, we the masses of lonely developers aren't really part of the equation.

If we were, there would at least be an affordable consumer end version so we can keep up and participate. Besides the, I think, generally pointless 2 month evaluation version.

I really hope someone from Progress will jump in and it won't have to be another discussion between ourselves.

Posted by Tim Kuehn on 29-Dec-2006 23:07

>> I think PSC needs a liason that the external

>> developer can authoritatively talk to

>> about issues like this.

That would be great ! though, I can't imagine anyone

volunteering for this suicide mission

It would be challenging, but not necessarily suicidal. It would take more tact and willingness to listen than what most techies tend to have.

Sadly, I

think, we the masses of lonely developers aren't

really part of the equation.

I'm sure that's not the case, but it does seem that way at times...

This thread is closed