When writing a query, is there a performance difference between using
table.active = YES vs table.active
table.active = NO vs NOT table.active
In my opinion, the biggest performance impact here is related to index selection. Given that there is an index with table.active as a leading component, what index selection does the COMPILE XREF show?
The XREF for each of the following shows the same index selection of an index with table.active as a leading component:
FOR EACH table WHERE table.active = YES NO-LOCK:
FOR EACH table WHERE table.active NO-LOCK:
FOR EACH table WHERE table.active = NO NO-LOCK:
The XREF for this query shows WHOLE-INDEX:
FOR EACH table WHERE NOT table.active NO-LOCK:
According to Dan Foreman, there is no performance difference between these constructs: pugchallenge.org/.../230_Indexing.ppt (slide 19)
FOR EACH table WHERE table.active = YES NO-LOCK:
FOR EACH table WHERE table.active NO-LOCK:
If there is a difference, I would attribute it to evaluating the equality expression as Torben mentioned.
For the NOT operator, it may have to do with NOT being treated more like a function than an operator as Tom pointed out. I can't find an Order of Operations reference, but that may be the reason the index engine won't deal with the NOT table.active construct.
So I think the main recommendations are to make sure there is an appropriate index for the query that includes the logical field; and to not use the NOT operator on logical fields in queries. After that, test performance of the other constructs against the table you are working on. Not sure what impact the presence of null values may have on performance.
Yes, it can be significant. Always try to use:
table.active = YES
table.active = NO
Particularly true if table.active is part of an index; otherwise the clause will not be considered when the index is chosen.
Although this is true, it makes one wonder, why the compiler cannot solve this for us ....
In the scenario given, I'd worry more about the functional difference between "table.active = table.active" (always true for all records) vs. "table.active = YES" (not always true) / "table.active = NOT table.active" (never true for any record) vs. "table.active = NO" (not always true).
Comparing a field to itself doesn't often make a lot of sense.
Also, comparisons on variables or table fields *do* get locked in when the WHERE clause is evaluated, and *are* picked up when indexes are selected. COMPILE XREF this:
DEFINE TEMP-TABLE stuff NO-UNDO FIELD thing AS CHARACTER FIELD active AS LOGICAL INDEX thing IS UNIQUE thing INDEX active active. DEFINE TEMP-TABLE morestuff NO-UNDO FIELD active AS LOGICAL. DEFINE VARIABLE icount AS INT NO-UNDO. DEFINE VARIABLE lfindme AS LOGICAL NO-UNDO. PAUSE 0 BEFORE-HIDE. DO icount = 1 TO 100000: CREATE stuff. ASSIGN thing = GUID active = (RANDOM (20,200) MOD 2 = 0). END. CREATE morestuff. ETIME(TRUE). FOR EACH stuff WHERE stuff.active = lfindme: END. MESSAGE ETIME. ETIME(TRUE). FOR EACH stuff WHERE stuff.active = FALSE: END. MESSAGE ETIME. ETIME(TRUE). FOR EACH stuff WHERE stuff.active = morestuff.active: END. MESSAGE ETIME.
and see that all 3 FOR EACH'es use the same index.
User-defined functions and class methods are a different matter.
If there is an index on active then my small test show:
Best: table.active
Equal: table.active = YES; table.active = NO (~ 10%)
Worst: NOT table.active (> 200%)
If there is not an index on active then:
Best: table.active; NOT table.active
Worst: table.active = YES; table.active = NO (~ 10%)
Frank, I believe the OP was asking about 1) comparing the field to a logical vs 2) just testing the field, not comparing the field to itself. One might note that some of these tests could produce different result sets were it possible for table.active to take the unknown value.
As I recall... the difference between NOT and field = NO comes down to NOT being treated as a function rather than as an operator.
It could be that I misread, yes. Hopefully @jquerijero can elaborate ?
[quote user="Torben"]
If there is an index on active then my small test show:
Best: table.active
Equal: table.active = YES; table.active = NO (~ 10%)
Worst: NOT table.active (> 200%)
If there is not an index on active then:
Best: table.active; NOT table.active
Worst: table.active = YES; table.active = NO (~ 10%)
[/quote]
If this is true and since checking for negative is rare, it sounds like direct inspection of the field is better.
My test was on version 12.1 windows gui 64 bit version. And the difference between 'table.active' and 'table.active = YES' seems to be the overhead for evaluating equality expression.
> "table.active = NOT table.active" (never true for any record)
But this expression CAN be true, if the field contains the unknown value.
(And I suppose that must have some implications for index searches too.)
[quote user="frank.meulblok"]
It could be that I misread, yes. Hopefully @jquerijero can elaborate ?
[/quote]
It's writing a query involving a logical field. Which of the following constructs are better?
FOR EACH table WHERE table.active = YES NO-LOCK:
or
FOR EACH table WHERE table.active NO-LOCK:
FOR EACH table WHERE table.active = NO NO-LOCK:
or
FOR EACH table WHERE NOT table.active NO-LOCK:
In my opinion, the biggest performance impact here is related to index selection. Given that there is an index with table.active as a leading component, what index selection does the COMPILE XREF show?
The XREF for each of the following shows the same index selection of an index with table.active as a leading component:
FOR EACH table WHERE table.active = YES NO-LOCK:
FOR EACH table WHERE table.active NO-LOCK:
FOR EACH table WHERE table.active = NO NO-LOCK:
The XREF for this query shows WHOLE-INDEX:
FOR EACH table WHERE NOT table.active NO-LOCK:
According to Dan Foreman, there is no performance difference between these constructs: pugchallenge.org/.../230_Indexing.ppt (slide 19)
FOR EACH table WHERE table.active = YES NO-LOCK:
FOR EACH table WHERE table.active NO-LOCK:
If there is a difference, I would attribute it to evaluating the equality expression as Torben mentioned.
For the NOT operator, it may have to do with NOT being treated more like a function than an operator as Tom pointed out. I can't find an Order of Operations reference, but that may be the reason the index engine won't deal with the NOT table.active construct.
So I think the main recommendations are to make sure there is an appropriate index for the query that includes the logical field; and to not use the NOT operator on logical fields in queries. After that, test performance of the other constructs against the table you are working on. Not sure what impact the presence of null values may have on performance.
If I recall correctly, Gus weighed in on this about 20 years ago. Part of it depends on whether it is a mandatory logical field. If not, the unknown value has to be taken into consideration and thus NOT means = TRUE OR = ?. I am unsure how much the function makes a difference. I understand if the logical is made mandatory, the performance gap narrows as ? cannot be a possibility (I am unsure of that though). Either way = TRUE and = FALSE are the best as they are equality matches and NOT is potentially a multi-equality match.