Performance problem on a temp-table with indexed decimal fie - OpenEdge Development - Forum

Posted by tbergman on 20-Apr-2016 20:22

I’ve run into an interesting performance problem with a temp-table (TT1). The two fields of interest are decimal fields, StartNum and EndNum. These both participate in the only index on the table. The length of the values in these fields varies between 8 and 10 digits and the values represent whole numbers. EndNum is always GE StartNum. It’s only a decimal because I needed to store values GT 2^31 and a lot of this was written before int64. There are about 55k records in TT1.

The program logic goes through another temp-table (TT2) with a similar decimal field (foo) that has values varying in length between 8 and 10 digits. It has about 7 million records. I executed a query like this.

for each TT2:

Find first TT1 where TT1.StartNum LE TT2.foo and TT1.EndNum ge TT2.foo no-error.

end.

Early tests seemed fine. Going through a thousand or so records in TT2 performed very well and I was ably to extrapolate the likely time for the entire set. It's when I began running this for more records that things began to go down hill.

After tearing my hair out for a while, I put an etime around the find and output the measurement to a text file. I found that the find would execute in < 1ms when the value of foo was 8 or 9 digits in length, as soon as it was 10 digits, it would take around 20ms. Since TT1 was looping in the order of the foo field, the problem stated to show up only after about 20k records as that's when the value of foo changed to 10 digits.

I switched to Int64 and had exactly the same problem. Changing everything to character performed well with all finds taking <1ms

Is this expected? What causes this?

Thanks,

Tom

OpenEdge Development - Forum

All Replies

Posted by Rob Fitzpatrick on 20-Apr-2016 20:40

Hi Tom,

One possibility is that your temp-tables have grown sufficiently large that they are no longer entirely memory-resident so you are seeing a performance decrease due to disk I/O to your client's DBI file.

Check the size of your DBI file during the query; if not on Windows you can see it with lsof -p <client PID> or similar, or start the client with -t to make the temp files visible. If it grows then you're doing disk writes, so your temp-table buffers are too small for your data.

Do you specify values for the -tmpbsize and -Bt client startup parameters or use the defaults? If the latter you have about 1 MB of temp-table buffers (4 KB * 255) which may well be insufficient for 55K records and is definitely insufficient for one million. Try bumping up these parameters to give you more buffer space than your max DBI size and see if performance changes. HTH.

Rob

Posted by Rob Fitzpatrick on 20-Apr-2016 20:43

In your tests with character fields, how many records were in your temp-tables?

Posted by Tim Kuehn on 20-Apr-2016 21:44

I'm thinking that the FIND FIRST / WHERE structure is running into cases where there are multiple records in TT1 that match part of the WHERE clause and it has to scan multiple records to find one that matches both parts of the WHERE condition - though it wouldn't explain why CHAR versions of the same fields performs as expected.

Replacing the FF with a FOR EACH / LEAVE may go faster - can you give that a try and see what happens?

Posted by tbergman on 21-Apr-2016 08:09

Thanks to everyone who responded. Answers to a few questions first.

DBI was not active and 0 bytes. SRT was in use but seemingly not related to problem. Upping -Bt had no effect.

For Each, leave was not an improvement.

Even with Character, I wasn't please with the performance.

The data in the table TT1 represents a range of values (they actually represent IP addresses). There is never an overlap in the data.

I was able to get great performance by having 2 indexes on TT1, one for start, one for end. Then I added a buffer for TT1 and queried like this.

Find Last TT1 where TT1.Start LE TT2.Foo.

Find First bTT1 where bTT1.End GE TT2.Foo.

If I have both buffers, and their ROWIDs are the same, then I found the appropriate record.

Very fast, problem resolved.

Thanks

Posted by Keith Sudbury on 21-Apr-2016 09:24

For future reference the problem is that you were only using the leading column on the compound index (StartNum) because you were using a range match on that column.

So your query was reading all TT1 records that were LE TT2.foo... and then brute force evaluating EndNum. You only get to use both columns in the index if the first is an equality match.

As the StartNum value increased you were reading more and more TT1 records.

Posted by Jeff Ledbetter on 21-Apr-2016 09:48

Aside from Tom's solution, what would be another approach?

Posted by Tim Kuehn on 21-Apr-2016 10:36

Another approach - get all the unique "startnum" in another TT and then do a join between that TT and the original TT like so:

/* tt1Start has all the unique "Startnum" values    */
/* tt1End has all the values in the original tt1    */

DEFINE TEMP-TABLE tt1Start NO-UNDO
    FIELD StartNum  AS INTEGER
    INDEX ixSt      IS UNIQUE StartNum.

DEFINE TEMP-TABLE tt1End NO-UNDO
    FIELD StartNum  AS INTEGER
    FIELD EndNum    AS INTEGER

    INDEX ixSt  StartNum EndNum.

DEFINE TEMP-TABLE tt2   NO-UNDO
    FIELD foo   AS INTEGER
    INDEX ixFoo.

FOR EACH tt2
    NO-LOCK:

    FOR EACH tt1Start
        WHERE tt1Start.StartNum <= tt2.foo
        NO-LOCK,

        EACH tt1End
            WHERE tt1End.StartNum = tt1Start.StartNum AND
                  tt1End.EndNum     >= tt2.foo
            NO-LOCK:

        /* do stuff */
        LEAVE.

    END.

END.

One could also accomplish a similar result using just the original TT and some variables if there was a need for it.

Posted by Keith Sudbury on 21-Apr-2016 10:54

Depending on the data skew you could change the order of the index to EndNum,StartNum and get a more consistent number of records per query. In some cases this works great because you quickly find a record with just a few extra reads. In other cases it is just as bad or worse.

Other solutions would be something similar to what Tim suggested. Or changing the design of the TT to store the data in a more index friendly way.

Posted by Jeff Ledbetter on 21-Apr-2016 13:38

Interesting. Thanks for the additional feedback to the original post. I know that sometimes it's easy to get lazy with temp-table indexing ("hey.. it's all in memory!"). It's good to keep in mind that it is equally as important as database tables. :)

This thread is closed