How to search *word* in a word index

Posted by cverbiest on 03-Feb-2016 09:20

During one of the Exchange sessions I asked for an enhancement for the contains condition to allow for * at the start of the pattern.

Gus mentioned that no enhancement was necessary because there is a trick to achieve that with the current word-indexes.

Unfortunately I forgot to ask what that trick is.

Example, I want to find William but I'm not sure it is Bill or William in the database (https://www.quora.com/Why-is-Bill-a-nickname-for-William) so I'd like to search for

 where name contains "*ill*"

This gives a "QBW syntax error - an asterisk (*) is allowed only at the end of a word. (4686)".

All Replies

Posted by George Potemkin on 03-Feb-2016 09:53

I guess a trick is to create a shadow field with the reverse order of the letters. But it will not let you to search for the pattern in the middle of the words.

Posted by richt on 03-Feb-2016 13:06

You can't use a prefix wild card to search a word index.  If you're Bill and William example is really representative of what you need then use an or operator.  "Bill | William" and if you are not sure if its Will or William then use:

"Bill | Will*

Posted by Simon L. Prinsloo on 04-Feb-2016 06:34

You can use MATCHES, but it will always do a table scan.

If I hand you a telephone book and ask you for the numbers of people that have surnames starting with "Will" or "Bill", you will most likely do an instinctive binary search for two ranges, from Bill to just before Bilm and from Will to just before Wilm. You will not only easily locate the page(s) that you need, you can actually take a pen and draw a line above the first and under the last applicable entries. You will easily bracket two sections of the book, (From Bill to just before Bilm and from Will to just before Wilm), although the brackets may be empty if nobody match the criteria.

On the other hand, if I ask you to find my the telephone numbers of every body with "ill" in the surname, how would you approach it? Sure it is indexed alphabetically on the surname, but how to you find the page to look in? You can't bracket your search, you have to read all the surnames. If there are no matches, you will not be able to know it until you've read all of them.

With the b-tree the db engine van locate "Bill* | Will*" even faster than you and me, but it has the same problem with "*ill*" in the index as as we have in the telephone book.

Posted by chrisrichardson on 03-Dec-2018 12:25

Hi - did you ever figure out what the trick is?

I'd like to search for *345* in the string 12345, using the contains operator.  Matches is no good as it performs a table-scan.

Posted by gus bjorklund on 04-Dec-2018 23:34

the contains operator works well only for leading character string searches in word-indexed queries. to make use of it for trailing string searchess, you would have to add a second character field in which all the values are in reverse order character-wise and then word index that.

This thread is closed