Slow indexes

Posted by ojfoggin on 03-Sep-2009 03:05

Hi again,

Just wondered if there is anything to look out for when creating indexes on a table?

We have a program searching through an index (I have checked the xref and it is definitely using the index) minus a couple of fields.  But it still sometimes takes 30 - 40 seconds to return 3000-4000 records.

Are there any combinations of data etc... that can cause an index to still be slow?

As a reference, this table has well over 1 million records so without using the index it would take 30-40 minutes to do the same thing.

Thanks for any help/pointers to reference text.

Oliver

All Replies

Posted by miperkin on 03-Sep-2009 04:26

The ABL Handbook has a bit of info about the use of multi-component indexes. If your index comprises columns A, B and C in that order; and your query is using columns A and C, you will only get the benefit of column A when the index is bracketed.

Posted by ojfoggin on 03-Sep-2009 04:32

Thanks,

The index uses column A, B, C and D (for example).

The "for each" that we have uses columns A, B and C (but not D) in fact we actually have another variable on the end of A, B and C that is separate from the index altogether.

A, B and C usually return about 3000 records.  Using the final variable takes them down to 5 or 6.

Posted by rbf on 03-Sep-2009 04:44

A much better way than studying the xref for determining what is going is is studying the vst table in order to find out how many reads there are on what tables and what indexes.

Posted by miperkin on 03-Sep-2009 04:53

Is it any faster returning the 5 or 6 than the 3000 (assuming they aren't all in memory already due to -B)? If the final variable is not a component of the index, then you will only be bracketing on A,B and C (assuming the check on A and B is for equality).

Posted by ojfoggin on 03-Sep-2009 05:13

No, the query seems to be slow even without the last variable.

We've added a log for this particular search to put out some params and the time it takes (on the live database) and it seems to be run all the time (i.e. keeping it in memory).  At the moment each time it is run takes about 300-400 ms (according to ETIME).

Next time we get a complaint that it is going slowly we'll be able to see exactly how long the query took (hopefully).

The query is ....

(input list of numbers (depot numbers))

loop each number in the list.

for each blah where blah.int1 = 1

and blah.char = "value"

and blah.depot = current entry in list

and blah.dec = 1.00

no-lock:


The blah.dec is the extra variable not included in the index.

Going through some of the different depots and checking the number of records returned by the index part of the query I get the following...

3754

6028

3237

5224

3319

and so on.

with the final variable I get 3 at the most returned.

Posted by ojfoggin on 03-Sep-2009 05:21

A few results from the log show that there is a definite problem.

Because the search is run fairly often it seems to be in the memory a lot.

The query generally takes around half a second to run.

But if it hasn't been run for around 10 or 11 minutes then the next time it runs it will take 8 minutes to run before then going back to half a second.

This is for exactly the same query with exactly the same parameters.

Posted by rbf on 03-Sep-2009 05:42

Seems to me that -B it too small then.

Posted by ojfoggin on 03-Sep-2009 06:08

That would be one fix but it's kind of ignoring the problem at hand.

It just means that there will be a longer period between queries when it suddenly takes 8 minutes again instead of half a second.

TBH I think we're going to have to rework how this query works completely.

Posted by ChUIMonster on 03-Sep-2009 07:42

The problems are:

1) A, B and C describes a set of several thousand records, so your query will return all of these records.  The "extra variable" is then used to filter that result set.  Adding an index on A, B, C and the extra variable would be the obvious way to make it so that the query only returns the 5 or 6 records that you need.

2) Because you are processing thousands of records you need a substantial number of IO ops when those records are not present in the buffer pool.  You might be able to mitigate this by increasing the size of the buffer pool (small increases are pointless, effectiveness of -B follows an inverse square law which means that you must (roughly) double it in size to cut IO in half) or by decreasing the demand for IO ops (see #1).  Depending on your storage area configuration you might also be able to reorganize the data to improve the effectiveness of the IO ops that you are doing and thus reduce the number needed (IOW if you are using type 1 areas the data may be scattered 1 record per block all over the area whereas a well designed type 2 area could pack as many as 256 records into a block and thus reduce IO substantially...)

As Peter said, using the VSTs to see how many requests you are really making would shed more light on the problem than etime() and an examination of XREF alone can.  Tools like ProTop - http://www.dbappraise.com/protop.html  are very useful in such an endeavour.

Posted by Peter Judge on 03-Sep-2009 08:02

A much better way than studying the xref for determining what is going is

is studying the vst table in order to find out how many reads there are on

what tables and what indexes.

Peter, have you tried using the -logentrytypes QryInfo ? If so, how does it compare with the VST data?

-- peter

Posted by ChUIMonster on 03-Sep-2009 10:18

It's interesting information but it is not very handy to work with.

1) You have to plan ahead and specify a log file.

2) Managing individual log files for each session on large systems promises to be "unwieldly".

3) Having to scan through the log file and parse the output is decidedly unpleasant.  Particularly at any scale.

4) The best bit of info is "4GL records" as opposed to "records from server".  So far as I know this is the only way to get that precious nugget.  (It needs to find its way into a VST.)

So, my opinion, almost usable, albeit ugly, for single user development issues when you already know that a particular bit of code might be a problem but almost completely useless for a deployed system.

Posted by Thomas Mercer-Hursh on 03-Sep-2009 11:23

When you say "current entry in list", are you talking about a single variable which contains the extracted value of the current entry or are you talking about entry(n, list)?

Posted by rbf on 03-Sep-2009 14:24

pjudge wrote:

Peter, have you tried using the -logentrytypes QryInfo ? If so, how does it compare with the VST data?

Hi Peter,

Tom beat me to it. Indeed QryInfo provides interesting information but it is not easy to process, although it all depends on the tools at hand. VSTs certainly are not easy to process either, but there are some good tools available to make them visible such as Protop, the VST viewer and the Query Tester and that makes all the difference. I suppose one could develop such a tool for processing the contents of the log file, but it would not be as easy and you have to deal with filtering the data, cleaning up the file etc. Personally I have never been tempted to undertake that endeavor.

Posted by Peter Judge on 03-Sep-2009 14:37

Tom beat me to it. Indeed QryInfo provides interesting information but it is

not easy to process, although it all depends on the tools at hand. VSTs

Thanks to you both for your replies.

I suppose one could develop such a

tool for processing the contents of the log file, but it would not be as easy

and you have to deal with filtering the data, cleaning up the file etc.

Personally I have never been tempted to undertake that endeavor.

I should note that a tool to help read the logs exists already; it's in the code share section at http://communities.progress.com/pcom/docs/DOC-3498 *

-- peter

  • I've not figured out how to 'properly' link in email replies

Posted by ChUIMonster on 03-Sep-2009 14:58

Reading the logs is just one obvious part of the problem.  What we really want to be able to do is to simply and easily integrate the data into a tool at will.  That's where VSTs and VST-like thinking shine.  It's just another record that you manipulate with the 4GL.  All diagnostic, logging and analysis data should work that way.

Posted by rbf on 03-Sep-2009 15:18

I had read about such a tool but forgot about it. I will have a look at it. Thanks Peter!

Posted by ojfoggin on 04-Sep-2009 02:40

I meant entry(n, list) taken from the loop just outside the for each.

This can just be a single number but will often contain a list of numbers ... "1,2,4,8,12,16,25,31,56,etc..."

Posted by Thomas Mercer-Hursh on 04-Sep-2009 10:45

So, the actual where clause does not contain a function?  That's what I was trying to determine.

This thread is closed