Fast(est) way of doing a CONTAINS in ABL

Posted by Peter Judge on 31-May-2017 09:30

I"m trying to figure out the fastest way of checking whether an input string  has at least one of another set of characters in it, in order to figure out whether I need to QUTOER() the input string or not.

I tried a vanilla loop and a (slightly) optimised loop and a CASE TRUE statement. I get some (to me) surprising results).

Note that the CASE is faster if there's work to do, but it's much slower if there's not, while the loops are more-or-less consistent.

Any thoughts on why this is? And whether I can make this faster still? My test code is attached.

[View:/cfs-file/__key/communityserver-discussions-components-files/19/test_5F00_index_5F00_speed.p:320:240]

input
iters
case
loop 1
loop 2
 
 
 
 
 
value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted
100000
545
275
277
isofafafanl
100000
567
217
213
 -asf-wrw
100000
107
215
216
value quoted and aother is that = is a value to be quoted
100000
126
238
230
utf-8
100000
490
210
213
"valuequoted"
100000
144
205
207
 spaced"value
100000
108
214
219
spaced value
100000
115
221
220

All Replies

Posted by Patrick Tingen on 31-May-2017 14:17

Is this 11.7 specific? I cannot compile this in 11.6 or am I missing something?

Posted by Peter Judge on 31-May-2017 14:34

Replace the StringConstant:SPACE with  a space and the DOUBLE_QUOTE with ~” .
 
Also, I’ve found bugs in my that test code, related to the fact that I don’t reset lQuote to FALSE between tests.
 
 
 

Posted by Peter Judge on 31-May-2017 14:58

Oh-kay. Removing some bugs in my code  and we get a something that seems more aligned with my expectations (which is that CASE is faster than any loop). New test code below.

input iters case 1 case 2 loop 1 loop 2 mix
value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted 100000 412 397 4078 2333 6723
isofafafanl 100000 548 604 1421 2100 2276
 -asf-wrw 100000 90 80 229 244 301
value quoted and aother is that = is a value to be quoted 100000 96 87 770 222 1299
utf-8 100000 479 476 660 1902 1046
"valuequoted" 100000 118 113 238 334 244
 spaced"value 100000 89 82 234 219 222
spaced value  100000 87 82 851 226 1444
AVERAGE 239.875 240.125 1060.125 947.5 1694.375
MAX 548 604 4078 2333 6723
MIN 87 80 229 219 222

new test code [View:/cfs-file/__key/communityserver-discussions-components-files/19/1447.test_5F00_index_5F00_speed.p:320:240]

Posted by Lars Neumeier on 31-May-2017 16:09

I would guess that something like this could be faster

lHasToBeQuoted = (LENGTH(inputString) > LENGTH(TRIM(inputString, ' ~"()<>@,;:\/[]?='))). 


Posted by Stefan Drissen on 31-May-2017 16:12

Case 1 can be improved by putting ParamValue[paramloop] in a non-extent character and using that in the whens instead of having to get the extent for every when.

This does lose out slightly on the last two inputs since they both match on the first part of the when so only get the overhead of the intermediate assign.

Posted by Stefan Drissen on 31-May-2017 16:13

[quote user="Lars Neumeier"]

I would guess that something like this could be faster

1
2
3
4
lHasToBeQuoted = (LENGTH(inputString) > LENGTH(TRIM(inputString, ' ~"()<>@,;:\/[]?='))).

[/quote]

That will fail with:

hello"there

Since trim works from outside in.

Posted by onnodehaan on 31-May-2017 16:17

Peter,

One way to do it, without a loop. I know it's not the fastest way; but more fun :-)


startTime[ 6 ] = mtime.
define variable cReplace as character.
do outerLoop = 1 to outerMax.
assign cReplace = ParamValue[paramloop]
             cReplace = replace(cReplace, ' ', '12')
             cReplace = replace(cReplace, '~"', '12')
             cReplace = replace(cReplace, '(', '12')
             cReplace = replace(cReplace, ')', '12')
             cReplace = replace(cReplace, '<', '12')
             cReplace = replace(cReplace, '>', '12')
             cReplace = replace(cReplace, '@', '12')
             cReplace = replace(cReplace, ',', '12')
             cReplace = replace(cReplace, ';', '12')
             cReplace = replace(cReplace, '=', '12')
             cReplace = replace(cReplace, ':', '12')
             cReplace = replace(cReplace, '~\', '12')
             cReplace = replace(cReplace, '/', '12')
             cReplace = replace(cReplace, '[', '12')
             cReplace = replace(cReplace, ']', '12')
             cReplace = replace(cReplace, '?', '12')
             lQuote = length(cReplace) <> length(ParamValue[paramloop]).
end.

endTime[ 6 ] = mtime.


And on one occasion it was faster than looping


"input","iters","case 1","case 2","loop 1","loop 2","mix"
"value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted",100000,435,427,4012,2091,3274,602
"isofafafanl",100000,510,511,1401,1967,2379,458   - last one is my replace option :-)
" -asf-wrw",100000,95,96,274,263,257,470
"value quoted and aother is that = is a value to be quoted",100000,96,96,874,266,1259,681
"utf-8",100000,502,511,732,1956,1225,441
"""valuequoted""",100000,126,124,272,383,261,490
" spaced""value",100000,98,97,274,262,261,496
"spaced value ",100000,98,98,973,266,1429,488

Posted by Stefan Drissen on 31-May-2017 16:18

And I hate to break it to you, but have you measured how long the QUOTER function takes? Its faster than any of the checks on quoter requirement...

I do have one trick up my sleeve that will be faster, but that requires another custom convmap :-)

Posted by Peter Judge on 31-May-2017 18:56

The array is just for the test, and in the real code is a TT field.
 
The QUOTER is (in my real code) called just once per ‘param value’. You’re right – I should manually add “ to the ends of the string. I’m not sure how/whether “ is escaped in HTTP headers. Thanks for that.
 
And you know my constraints on convmaps :)
 
My thinking is that if I can get the AVM’s C code to do the most work (as opposed to the ABL itself) then that’ll be fastest.
 

Posted by doa on 01-Jun-2017 02:31

Are you on windows?

If yes you could use the .Net regex functions which are very fast.

Posted by marian.edu on 01-Jun-2017 02:46

agreed with Stefan on this one, clearly a waste of time... both compiler's and yours Peter ;)

Posted by Lars Neumeier on 01-Jun-2017 03:20

I would go with the following one:

lQuote = (INDEX(ParamValue[paramloop], ' ') > 0
         OR INDEX(ParamValue[paramloop], '~"') > 0
         OR INDEX(ParamValue[paramloop], '(')  > 0
         OR INDEX(ParamValue[paramloop], ')')  > 0
         OR INDEX(ParamValue[paramloop], '<')  > 0
         OR INDEX(ParamValue[paramloop], '>')  > 0
         OR INDEX(ParamValue[paramloop], '@')  > 0
         OR INDEX(ParamValue[paramloop], ',')  > 0
         OR INDEX(ParamValue[paramloop], ';')  > 0
         OR INDEX(ParamValue[paramloop], '=')  > 0
         OR INDEX(ParamValue[paramloop], ':')  > 0
         OR INDEX(ParamValue[paramloop], '~\') > 0
         OR INDEX(ParamValue[paramloop], '/')  > 0
         OR INDEX(ParamValue[paramloop], '[')  > 0
         OR INDEX(ParamValue[paramloop], ']')  > 0
         OR INDEX(ParamValue[paramloop], '?')  > 0).

input iters case 1 case 2 loop 1 loop 2 mix index
value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted 100000 550 564 6730 3395 5039 501
isofafafanl 100000 704 723 2411 3339 3700 705
 -asf-wrw 100000 176 187 470 454 439 173
value quoted and aother is that = is a value to be quoted 100000 174 171 1473 470 2005 189
utf-8 100000 719 736 1243 3334 1909 664
"valuequoted" 100000 204 235 468 639 437 218
 spaced"value 100000 187 189 470 439 440 171
spaced value  100000 157 203 1628 439 2216 172

Posted by ske on 01-Jun-2017 05:21

@Lars Neumeier:

> I would go with the following one

> ... OR INDEX(ParamValue[paramloop], '(')  > 0 ...

In that case you can still shave off some more time by reducing repetative array indexing and comparisons this way:

t = ParamValue[paramloop].

lQuote = INDEX(t, ' ') + INDEX(t, '~"') + ... > 0

Posted by ske on 01-Jun-2017 06:09

Lars Neumeier:

> I would guess that something like this could be faster

> lHasToBeQuoted = (LENGTH(inputString) > LENGTH(TRIM(inputString, ' ~"()<>@,;:\/[]?='))

Stefan Drissen:

> That will fail ... Since trim works from outside in

Reverse it. TRIM away all the characters that do NOT need quoting, and see if the result is a non-empty string. Then there are characters that need quoting.

This thread is closed