Got asked a Newbie Question: Nested -vs- Join

Posted by S33 on 04-Mar-2015 08:58

I got asked a basic question and realized I didn't know the answer:

Readability aside, is there generally a performance difference between nested "for each" loops -vs- a single join?

/* one way */
for each table1 break by somefield:
    /* do something */
    for each table2 of table1:
        /* do something more */
    end.
end.

/* other way */
for each table1, 
    each table2 of table1 break by table1.somefield:
    if first-of(table1.somefield) then
        /* do something */ .
    /* do something more */
end.


Thanks in advance --

Jim Shepherd
Level 7

All Replies

Posted by rugadillo on 04-Mar-2015 09:29

I just ran a test on some data using ETIME:

Nested: 312ms

Joined: 268ms

Posted by James Palmer on 04-Mar-2015 09:32

There was a discussion about this recently and I believe it completely depends on the query you are running!

Posted by Lieven De Foor on 04-Mar-2015 09:38

It certainly is when running against a database (using DataServer) that supports joins on the server (I know the Oracle database + dataserver supports this). Server-side joins can be a multitude faster than client side joins.

As far as I know the OpenEdge database doesn't support server side joins (yet?), but apparently this is still a performance benefit in joining vs nested loops...

Posted by ke@iap.de on 04-Mar-2015 09:38

When you test, you must at minimum restart DB or better reboot between tests, otherwise OS (and may be DB caching) will have big impact.

Second: Take a bigger test where test time is about a few seconds, the time measurement in OE is not too exact.

Third: The nest is not the biggest issue. Much bigger is the break by, where OE needs to hold two records to compare. When you remove the break by and use variables to detect the change, then that should make a difference.

my 2 cent :)

Klaus

Posted by Tim Kuehn on 04-Mar-2015 09:46

The AVM doesn't do server-side joins - the upper level of the WHERE phrase is resolved by a db engine, and then each record of the subordinate portions of the WHERE clause are evaluated for a match with the higher level phrase a record at a time.

This all happens "under the covers" when the FOR EACH / WHERE is all in one statement. There's some additional overhead when the WHERE conditions are split into two FOR EACH / WHERE statements, and that's where the performance hit comes in.

Posted by rugadillo on 04-Mar-2015 10:12

Klaus,

Excellent points on all regards!  I was trying to find a quick answer, and forgot about the maxim that quick answers are rarely the most accurate ones.

Posted by Marko Myllymäki on 05-Mar-2015 03:41

Here's some related KB articles:

http://knowledgebase.progress.com/articles/Article/000053086

http://knowledgebase.progress.com/articles/Article/18342

http://knowledgebase.progress.com/articles/Article/000012195

It's sad that OpenEdge does not support server-side joins. I have made an enhancement request for that but I haven't got much hope from the Progress people to get it implemented. "Use AppServer" they always say...

This thread is closed