Indexing of Multiple (lots) fields in tables

Posted by kingsway on 01-Mar-2011 11:50

OK:

I'm a little stumped by this one, so thought I would ask the Progress community for some input.

I have a client who has run a package for about 15 years. There is now a request to index (or at least have a quick search process) for an almost unlimited number of data elements in a table (eg reference#1, #2, .....#n), some currently inside extents.

Of course, I can create 10, 15 100 separate indexes against this table for all of these tracing data, but that is bordering on pointless, involves a lot of rewriting, and is an immense resource grab. Not a big deal for 1000 records (say a customer table), but in an order table with maybe several million and Gigs of raw data?

My initial reaction is just to create an indexed cross referencing table with all of the fields and the reference relating back to the order (in this case), but that requires a lot of manual coding as I would need to maintain all the references on every update.

I cannot think of any easy way around this, so anyone do this? I'm sure it's not an isolated request, so maybe the concepts are already in production somewhere.

Hope this makes sense, and thanks for your time folks.

Jim

All Replies

Posted by Admin on 01-Mar-2011 12:21

I have a client who has run a package for about 15 years. There is now a request to index (or at least have a quick search process) for an almost unlimited number of data elements in a table (eg reference#1, #2, .....#n), some currently inside extents.

Can you provide samples of required new searches and existing indexes?

Requests like this are often successfully solved with word indexes, sometimes with custom word-break tables to support a tokenized search. In a write trigger you could build a super-index, like

"SalesRep_BBB Country_USA State_NH Name_Lift$Line$Skiing" etc.

That will allow searching for any combination of the tokens using the Contains phrase:

CONTAINS "SalesRep_BBB State_N"

Posted by kingsway on 01-Mar-2011 12:51

Sure:

Some indexes below:

I would need to make accessible (typically text) for a selection of disparate data entered, so a word index based around a trigger could do this fairly easily.

As there are multiple companies (entity below in indexes), this could fall apart unless I write something intelligent relating to each company scan as that could be duplicated in the word search. Maybe a bit dangerous.

Thanks,

Jim

ADD INDEX "ndx_ord-pro" ON "order"             

AREA "Index Area"                            

UNIQUE                                       

PRIMARY                                      

INDEX-FIELD "entity" ASCENDING               

INDEX-FIELD "pro#" ASCENDING                 

INDEX-FIELD "pro-sub" ASCENDING 

ADD INDEX "ndx_ord-barc" ON "order"            

AREA "Index Area"                            

INDEX-FIELD "entity" ASCENDING               

INDEX-FIELD "bar-code" ASCENDING   

ADD INDEX "ndx_ord-bill" ON "order"            

AREA "Index Area"                            

UNIQUE                                       

INDEX-FIELD "entity" ASCENDING               

INDEX-FIELD "cust-num" ASCENDING             

INDEX-FIELD "o_servdt" DESCENDING            

INDEX-FIELD "pro#" DESCENDING                

INDEX-FIELD "pro-sub" DESCENDING    

ADD INDEX "ndx_ord-con#" ON "order"            

AREA "Index Area"                            

UNIQUE                                       

INDEX-FIELD "entity" ASCENDING               

INDEX-FIELD "o_cnum" ASCENDING               

INDEX-FIELD "o_servdt" DESCENDING            

INDEX-FIELD "pro#" DESCENDING                

INDEX-FIELD "pro-sub" DESCENDING     

ADD INDEX "ndx_ord-cons" ON "order"            

AREA "Index Area"                            

INDEX-FIELD "entity" ASCENDING               

INDEX-FIELD "o_cname" ASCENDING                I

NDEX-FIELD "o_servdt" ASCENDING             

INDEX-FIELD "pro#" ASCENDING                 

INDEX-FIELD "pro-sub" ASCENDING       

ADD INDEX "ndx_ord-cref" ON "order"            

AREA "Index Area"                            

INDEX-FIELD "entity" ASCENDING               

INDEX-FIELD "o_cref#" ASCENDING              

INDEX-FIELD "o_servdt" DESCENDING            

INDEX-FIELD "pro#" ASCENDING

Posted by Admin on 01-Mar-2011 12:59

As there are multiple companies (entity below in indexes)

Assuming Entity is a 4 digit integer, you could add that to the indexed string like this:

Entity_0001

Entity_0002

Posted by kingsway on 01-Mar-2011 13:12

so you would need a double "contains"

(Contains Entity__x

     AND contains Reference IDy) in an element that you load with all the references you need.

How big a hit is this on resources?

Thanks

Posted by Admin on 01-Mar-2011 13:44

so you would need a double "contains"

That would just be a single contains:

Contains "Entity__x Reference_IDy"

The word index is able to resolve this very effectively. AND is the default, OR is optional. The word index can dynamically use the required number of brackets.

Posted by Admin on 01-Mar-2011 13:46

ID: P12969
Title: "4GL Query concepts (FOR EACH, FIND, GET, INDEX)"

...


QUERY BY WORDS

Queries by words are supported through the 4GL CONTAINS clause. They use word-indexes, which have the same structure as regular indexes, but contain an entry for each word in a character field rather than one entry for the whole field, as regular indexes do.

Evaluating CONTAINS clauses differs from evaluating other index expressions in two ways:

- a CONTAINS clause is viewed as a single index bracket by the
client, but may actually use multiple brackets, depending on the
expression. For example, the query

for each claim
where (description contains "lawyer | attorney"):
end.

requires two brackets. The decision of how many brackets and how to use them is done by the server, at run-time. This allows the 4GL program to not only change the words in the clause, but also the operators between the words.

- the CONTAINS clause cannot be evaluated during selection - a
word-index must always be used. As a result, the client cannot
deal with it, and the server must evaluate it using the word-index.

...

Posted by kingsway on 01-Mar-2011 14:06

Could work. Not sure about scanning an entire set of records for one entry.

How efficient is the word indexing? How much overhead would this take up in a fairly large scan, say 2 million records looking for 2 elelments in a fairly large word key, maybe 15 - 30 individual elements?

Posted by Admin on 01-Mar-2011 14:37

I have no data at hand.

But I've used that technique in the past with very large tables and many different search tokens. It did perform very good. Only specialized indexes should perform better, because the server wouldn't need to merge the result set of multiple brackets.

But the disadvantage of specialized indexes is, that you'll need a lot of them - for any possible combination - as you mentioned in your initail post.

Posted by kingsway on 01-Mar-2011 18:55

OK.

Sounds like its a worthwhile option.

I'll give it a try.

As the system is already slow during peak activity, this sounds like a great way to leverage a new server.

Thanks for your help.

Jim

This thread is closed