Find records that has all items in the list

Posted by Joey Samoy on 07-Jun-2019 22:32

Hi,

This is seemingly simple, but somehow I cannot get the right code.

I have a list value, "A,B".

I want to find all table that has both "A,B". This list is dynamic, and may have hundreds. I cannot use lookup, because if "A" is found, "B" may not.

I have that loops thru this list and find each record, if at least one is not found, go to next record. It works, but it looks ugly. :-(

TIA!

Posted by jonathan.wilson on 12-Jun-2019 10:38

So the following is the MATCHES stuff in a quick example... again this only works if you sort your "ABC" data in the string on both and the identifiers are unique. 

DEF TEMP-TABLE ttData NO-UNDO 
  FIELD c_code AS CHAR 
 INDEX idx1 c_code.

FUNCTION CreateTT RETURN LOG (INPUT c_data AS CHAR):
    CREATE ttData.
    ASSIGN ttData.c_code = c_data.
END FUNCTION.

CreateTT ("A,B,C").
CreateTT ("A,C").
CreateTT ("A,B,C,D").
CreateTT ("C,A").

FOR EACH ttData WHERE c_code MATCHES "*A*C*":
    DISP ttData.
END.

All Replies

Posted by David Abdala on 10-Jun-2019 10:27

Well, your question is not clear enough for me..

What I understands is that you want to find all the records that match with all the values in a list. Someting like:

FOR EACH table WHERE LOOKUP(field,list) GT 0:

ENTRY(LOOKUP(field,list),list) = STRING(ROWID).

END.

/* at this point, list has the rowids of the records, instead of the values, you may need to check that all the valures have been replaced */

.. probably is not what you  are really asking about..

Posted by Joey Samoy on 10-Jun-2019 14:35

Hi David,

For example, my list would have A,B

Record 1 - has A,B,C

Record 2 - has A,D

The routine should only find Record 1.

TIA!

Posted by onnodehaan on 10-Jun-2019 14:44

Hi,

You could build up a dynamic query, perhaps?

every item in the list would be a seperate part of the query

A,B,D,E

query = 'for each table where '

do i = 1 to num-entries(list)

 query = query + .....

end.

run the query.

Now you only have one query to run.

But I admin... still pretty ugly :-)

Posted by George Potemkin on 10-Jun-2019 15:23

Can you use a word index?

Posted by Patrick Tingen on 10-Jun-2019 19:49

Could you give the relevant details for the table structure for this problem? Are the details ypu are looking for in a comma separated list in the table or are they records in a child table? This is not quite clear enough to give an answer

Posted by jonathan.wilson on 11-Jun-2019 07:20

Now this post won't be helpful answer... my DBA hat on, the data structure is wrong here.  You shouldn't have a comma separated list of entities in a field as it's not good "normal form".  You want something that's elegant to solve a problem; but it's your data structure that's the real issue.

If you always place entries in an order on the string and don't just add randomly to the end you should try MATCHES based off that sort order "*A*B*" but this is slow.

Posted by Joey Samoy on 11-Jun-2019 15:00

Hi All,

George, can't do word index.

Jonathan, the list is built at run time, perhaps I can use array, or temp-table. But still have to do the same logic of validating that all of the items in this list exists in one record.

Patrick, let me try again with another example.

Record 1

Id

A US

B CA

C UK

Record 2

Id

A US

C UK

Record 3

Id Country

A US

B CA

C UK

D AU

MyList = "A,B" - matches with 1 and 3

MyList = "A,B,C" - matches all

MyList = "D" - matches 3 only

TIA!

Posted by LarryDusch on 11-Jun-2019 15:21

Based on what you've said the data structure is in the table, I'm assuming you have a comma separated list in the table and you have a comma separated list for your query.  Try this:

def var has_all as logical no-undo.
def var i as int no-undo.

... other defs ....

readloop:
for each mytable no-lock:
    has_all = true.
    chkloop:
     repeat i = 1 to num-entries(myquerylist):
         if lookup(entry(i,myquerylist),mytable.fieldtocheck) = 0
         then do:
             has_all = false.
             leave chkloop.
          end.
      end.   
      if has_all = false then next readloop.
          ....  do it has all of them stuff ....
end.   


Posted by Thomas Mercer-Hursh on 11-Jun-2019 20:02

As Jonathan says , the real problem here is the data structure.  Those country codes should really be in a separate table with 0 to N records per base entity depending on the number of countries for each base entry.  That makes a simple query as a side-effect.

Posted by jonathan.wilson on 12-Jun-2019 10:38

So the following is the MATCHES stuff in a quick example... again this only works if you sort your "ABC" data in the string on both and the identifiers are unique. 

DEF TEMP-TABLE ttData NO-UNDO 
  FIELD c_code AS CHAR 
 INDEX idx1 c_code.

FUNCTION CreateTT RETURN LOG (INPUT c_data AS CHAR):
    CREATE ttData.
    ASSIGN ttData.c_code = c_data.
END FUNCTION.

CreateTT ("A,B,C").
CreateTT ("A,C").
CreateTT ("A,B,C,D").
CreateTT ("C,A").

FOR EACH ttData WHERE c_code MATCHES "*A*C*":
    DISP ttData.
END.

Posted by Joey Samoy on 13-Jun-2019 14:28

Thank you Jonathan. This is exactly how concise the code I wanted it to be.

Thank you all for chipping in.

Posted by ChUIMonster on 13-Jun-2019 14:58

Good luck.

As Jonathon says above "this is slow".  Using MATCHES in your WHERE clause is going to result in a table scan.  If the target table is large performance is going to be somewhere between "poor", "abysmal" and "broken".

Keep in mind that the table may not start life as a large table.  But if it becomes large somewhere down the road you are going to have a problem on your hands.

Posted by George Potemkin on 13-Jun-2019 17:03

Joey, another possible solution is to use a dynamic query adding to a WHERE clause 'OR  c_code EQ "item" ' for each item in the list.

This thread is closed