Indexes and retrieval order

Posted by Admin on 17-Mar-2007 13:27

I have a table called tabX with e.g. 10 fields: fieldA, fieldB, fieldC, fieldD, fieldE etc

I need to retrieve records that have specific values in fieldC and fieldD, but I need to do it in order of fieldA.

This is todays code:

for each tabX where fieldC='test' and fieldD="testmore' use-index idxfieldA:

...

end.

idxfieldA consists just of fieldA.

This is a huge table, so this is extremely slow. I am using idxfieldA because of the sort order.

How can I define a useful index for this?

If I e.g. create an index consisting of fieldA, fieldC and fieldD, would this not be sorted in fieldA-order then and not be especially useful? Can I create a index consisting of fieldC, fieldD and fieldA (in that order). Will it then sort on fieldC then fieldD and then fieldA?

What if I define an index consisting just of fieldC and fieldD. Will Progress automatically use this if I don't specify "use-index" in for each? And then I might just use an order by to retrieve the rows in fieldA-order like this:

for each tabX where fieldC='test' and fieldD="testmore' order by fieldA:

...

end.

I hope anyone have a good suggestion on how to handle this case...

Message was edited by:

Stein Rune Risa

All Replies

Posted by Thomas Mercer-Hursh on 17-Mar-2007 13:56

The best answer to this depends on not just this one query, but the full set of queries that you need to use against this table.

If the characteristic pattern of use is what we see here, i.e., selection on single values for each field, then you probably will want one index per field since that will allow multi-index query resolution to most rapidly identify the subset of records corresponding to your selection criteria.

As to the sort order, you should determine that by the use of a BY clause.

If this were the only query you ever did on this table, you could define an index composed of C, D, and A and it would bracket on the first two and give you the sort on the third ... but one guesses that this isn't the only query.

Posted by Admin on 17-Mar-2007 14:05

OK, this might seem as a good idea. There is already an index consisting of just fieldC.

What you're saying is that if I add an index consisting of just fieldD, I would see a performance increase, because I already have fieldC "covered"? How well will this perform compared with having a composite index of fieldC, fieldD and fieldA?

What about an index consisting of fieldC and fieldD (not A) and using BY? Might this pose problems?

Posted by Thomas Mercer-Hursh on 17-Mar-2007 14:19

Like I said, it depends on the totality of your queries. If you have very few queries on this table and or one or more of them is very important, then adding an index to support that query alone is a perfectly good idea. But, not only is this unlikely, but Progress is very good at using multiple indexes to resolve a query. In general, indexes with a single field are very powerful, especially if one is typically looking for single values, because they allow very rapid identification of a result set without having to read any records. If there are two or more selection criteria which are single values like this, it will identify the result set for each index and then combine those result sets to find only the records fitting all criteria ... and it still hasn't read any records. If there are a million records in that result set, then I suppose the three field index would save sorting the results, but if there is only a moderate number, then the sort on A is not a big deal.

Posted by Admin on 17-Mar-2007 14:32

OK, than you very much for your help.

I had a look at my code, and found out that the reason I was was using "use-index", was because I use "find first" and not "for each". A "by" clause on "find first" will have no meaning because the data is retrieved before sort...

After your very good answers I will do the following:

1. Add a single field index on fieldD

2. Rewrite my "find first" to:

for each tabX where fieldC = "test" and fieldD ="testmore" by fieldA:

display tabX.

leave.

end for.

In this way I avoid creating the composite index. My only worry is sorting with "by" here. I run this code as a part of a batch job, and the procedure will run 200-500 times in a row. Even if number of records matching these criteria on fieldC and fieldD are low (

Any thoughts on this?

Message was edited by:

Stein Rune Risa

Posted by Admin on 17-Mar-2007 14:53

One additional question. If I use a for each-statement such:

for each tabX where fieldC = "test" and fieldD = "testmore":

display tabX.

end.

and have two singe column indexes, one on fieldC and one on fieldD.

How will this for each be sorted (having no BY)? Will it sort by one of the indexes even if it has to use both of them to retrieve the correct result set?

Posted by Thomas Mercer-Hursh on 17-Mar-2007 15:11

Find first is suspect because it often doesn't mean what you think it means. Use this only when you want to determine that there is at least one member of a set ... in which case you should be using can-find anyway. For finding the first record in a set in a predictable order, use for each and exit the loop after the first record. Then you can use a BY clause to insure the correct order.

Use-index is also highly suspect. Better is to use the BY clause and let Progress work its magic. E.g., Use-index will never allow the multi-index query resolution.

Tell me more about why you are running this 200-500 times in the same procedure.

It is possible that you have defined one of those *** places where a three level index and a find is the best answer, but it is also possible that there is a different solution to the problem.

What is the nature of field A?

Posted by Thomas Mercer-Hursh on 17-Mar-2007 15:12

If you don't specify BY, I think in this case you are likely to get RECID order, but it is best to treat it as unknown ... since you didn't tell it what to do.

Posted by Admin on 17-Mar-2007 15:33

I don't run it 200-500 times in the same procedure, but the procedure is part of a batch processing system, so if there are e.g. 300 orders, it will run the procedure 300 times.

Each order is processed independent of the others, so this must unfortunately work like this.

It seems, based on your answers, that I need to create a composite index consisting of fieldC, fieldB and fieldA and change from using "find first" to using "for each" with "leave" in.

Just one thing to be 100% sure, the order of the fields in the index does matter? If i put fieldD, fieldC and fieldA in an index it will sort the index on those fields in that order, not depending on the initial table definition?

Thank you for spending time helping me with this.

Message was edited by:

Stein Rune Risa

Posted by Thomas Mercer-Hursh on 17-Mar-2007 15:55

If you could explain a bit more about the domain issues, we might have some other suggestions, but based on just what I think you have said ...

If you really only want the first record where C and D are known and A is the lowest value in the set and you are doing this hundreds of times a day, then it seems like you have a case where the three part index would be the right thing. I would specify all three fields in the BY clause.

Yes, the sort order will be determined by the index if you read all of the matching values. Each index is independent of every other index and determines the sort for records retrieved by that index. For multiple records with the same value, they will be in RECID order.

Posted by Admin on 17-Mar-2007 16:03

It will be a huge task explaining the whole solution, and I can not do it either because of security measures, but redesigning it will not be possible. I have to fix this within the given limits.

I think the solution with a composite index will be the best performing solution, so I will stick with that. Do you think Progress will pick this index automatically, or should i use use-index to be sure?

Are you sure about using all three fields in the BY-statement? Will this not force a unnecessary sort on the data (the index already ensures the correct sorting)?

Message was edited by:

Stein Rune Risa

Posted by Thomas Mercer-Hursh on 17-Mar-2007 18:16

Probably time for some experimentation and testing. Use compile XREF to see what index it picks. I would be surprised to see it pick anything except the new composite one. You probably can get by without the BY, but I am also sure that it won't hurt and it helps to document the code. You shouldn't need the use-index.

If you have been dong table scans up until now, the performance boost should be spectacular!

This thread is closed