Indexing of Multiple (lots) fields in tables

Posted by Admin on 01-Mar-2011 12:21

I have a client who has run a package for about 15 years. There is now a request to index (or at least have a quick search process) for an almost unlimited number of data elements in a table (eg reference#1, #2, .....#n), some currently inside extents.

Can you provide samples of required new searches and existing indexes?

Requests like this are often successfully solved with word indexes, sometimes with custom word-break tables to support a tokenized search. In a write trigger you could build a super-index, like

"SalesRep_BBB Country_USA State_NH Name_Lift$Line$Skiing" etc.

That will allow searching for any combination of the tokens using the Contains phrase:

CONTAINS "SalesRep_BBB State_N"

Posted by kingsway on 01-Mar-2011 12:51

Sure:

Some indexes below:

I would need to make accessible (typically text) for a selection of disparate data entered, so a word index based around a trigger could do this fairly easily.

As there are multiple companies (entity below in indexes), this could fall apart unless I write something intelligent relating to each company scan as that could be duplicated in the word search. Maybe a bit dangerous.

Thanks,

Jim

ADD INDEX "ndx_ord-pro" ON "order"

AREA "Index Area"

UNIQUE

PRIMARY

INDEX-FIELD "entity" ASCENDING

INDEX-FIELD "pro#" ASCENDING

INDEX-FIELD "pro-sub" ASCENDING

ADD INDEX "ndx_ord-barc" ON "order"

AREA "Index Area"

INDEX-FIELD "entity" ASCENDING

INDEX-FIELD "bar-code" ASCENDING

ADD INDEX "ndx_ord-bill" ON "order"

AREA "Index Area"

UNIQUE

INDEX-FIELD "entity" ASCENDING

INDEX-FIELD "cust-num" ASCENDING

INDEX-FIELD "o_servdt" DESCENDING

INDEX-FIELD "pro#" DESCENDING

INDEX-FIELD "pro-sub" DESCENDING

ADD INDEX "ndx_ord-con#" ON "order"

AREA "Index Area"

UNIQUE

INDEX-FIELD "entity" ASCENDING

INDEX-FIELD "o_cnum" ASCENDING

INDEX-FIELD "o_servdt" DESCENDING

INDEX-FIELD "pro#" DESCENDING

INDEX-FIELD "pro-sub" DESCENDING

ADD INDEX "ndx_ord-cons" ON "order"

AREA "Index Area"

INDEX-FIELD "entity" ASCENDING

INDEX-FIELD "o_cname" ASCENDING I

NDEX-FIELD "o_servdt" ASCENDING

INDEX-FIELD "pro#" ASCENDING

INDEX-FIELD "pro-sub" ASCENDING

ADD INDEX "ndx_ord-cref" ON "order"

AREA "Index Area"

INDEX-FIELD "entity" ASCENDING

INDEX-FIELD "o_cref#" ASCENDING

INDEX-FIELD "o_servdt" DESCENDING

INDEX-FIELD "pro#" ASCENDING

Posted by Admin on 01-Mar-2011 12:59

As there are multiple companies (entity below in indexes)

Assuming Entity is a 4 digit integer, you could add that to the indexed string like this:

Entity_0001

Entity_0002

Posted by kingsway on 01-Mar-2011 13:12

so you would need a double "contains"

(Contains Entity__x

AND contains Reference IDy) in an element that you load with all the references you need.

How big a hit is this on resources?

Thanks

Posted by Admin on 01-Mar-2011 13:44

so you would need a double "contains"

That would just be a single contains:

Contains "Entity__x Reference_IDy"

The word index is able to resolve this very effectively. AND is the default, OR is optional. The word index can dynamically use the required number of brackets.

Posted by Admin on 01-Mar-2011 13:46

ID: P12969
Title: "4GL Query concepts (FOR EACH, FIND, GET, INDEX)"

...

QUERY BY WORDS

Queries by words are supported through the 4GL CONTAINS clause. They use word-indexes, which have the same structure as regular indexes, but contain an entry for each word in a character field rather than one entry for the whole field, as regular indexes do.

Evaluating CONTAINS clauses differs from evaluating other index expressions in two ways:

- a CONTAINS clause is viewed as a single index bracket by the
client, but may actually use multiple brackets, depending on the
expression. For example, the query

for each claim
where (description contains "lawyer | attorney"):
end.

requires two brackets. The decision of how many brackets and how to use them is done by the server, at run-time. This allows the 4GL program to not only change the words in the clause, but also the operators between the words.

- the CONTAINS clause cannot be evaluated during selection - a
word-index must always be used. As a result, the client cannot
deal with it, and the server must evaluate it using the word-index.

...

Posted by kingsway on 01-Mar-2011 14:06

Could work. Not sure about scanning an entire set of records for one entry.

How efficient is the word indexing? How much overhead would this take up in a fairly large scan, say 2 million records looking for 2 elelments in a fairly large word key, maybe 15 - 30 individual elements?

Posted by Admin on 01-Mar-2011 14:37

I have no data at hand.

But I've used that technique in the past with very large tables and many different search tokens. It did perform very good. Only specialized indexes should perform better, because the server wouldn't need to merge the result set of multiple brackets.

But the disadvantage of specialized indexes is, that you'll need a lot of them - for any possible combination - as you mentioned in your initail post.

Posted by kingsway on 01-Mar-2011 18:55

OK.

Sounds like its a worthwhile option.

I'll give it a try.

As the system is already slow during peak activity, this sounds like a great way to leverage a new server.

Thanks for your help.

Jim

This thread is closed

All Replies