I"m trying to figure out the fastest way of checking whether an input string has at least one of another set of characters in it, in order to figure out whether I need to QUTOER() the input string or not.
I tried a vanilla loop and a (slightly) optimised loop and a CASE TRUE statement. I get some (to me) surprising results).
Note that the CASE is faster if there's work to do, but it's much slower if there's not, while the loops are more-or-less consistent.
Any thoughts on why this is? And whether I can make this faster still? My test code is attached.
[View:/cfs-file/__key/communityserver-discussions-components-files/19/test_5F00_index_5F00_speed.p:320:240]
input
|
iters
|
case
|
loop 1
|
loop 2
|
|
|
|
|
|
value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted
|
100000
|
545
|
275
|
277
|
isofafafanl
|
100000
|
567
|
217
|
213
|
-asf-wrw
|
100000
|
107
|
215
|
216
|
value quoted and aother is that = is a value to be quoted
|
100000
|
126
|
238
|
230
|
utf-8
|
100000
|
490
|
210
|
213
|
"valuequoted"
|
100000
|
144
|
205
|
207
|
spaced"value
|
100000
|
108
|
214
|
219
|
spaced value
|
100000
|
115
|
221
|
220
|
Is this 11.7 specific? I cannot compile this in 11.6 or am I missing something?
Oh-kay. Removing some bugs in my code and we get a something that seems more aligned with my expectations (which is that CASE is faster than any loop). New test code below.
input | iters | case 1 | case 2 | loop 1 | loop 2 | mix |
value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted | 100000 | 412 | 397 | 4078 | 2333 | 6723 |
isofafafanl | 100000 | 548 | 604 | 1421 | 2100 | 2276 |
-asf-wrw | 100000 | 90 | 80 | 229 | 244 | 301 |
value quoted and aother is that = is a value to be quoted | 100000 | 96 | 87 | 770 | 222 | 1299 |
utf-8 | 100000 | 479 | 476 | 660 | 1902 | 1046 |
"valuequoted" | 100000 | 118 | 113 | 238 | 334 | 244 |
spaced"value | 100000 | 89 | 82 | 234 | 219 | 222 |
spaced value | 100000 | 87 | 82 | 851 | 226 | 1444 |
AVERAGE | 239.875 | 240.125 | 1060.125 | 947.5 | 1694.375 | |
MAX | 548 | 604 | 4078 | 2333 | 6723 | |
MIN | 87 | 80 | 229 | 219 | 222 |
new test code [View:/cfs-file/__key/communityserver-discussions-components-files/19/1447.test_5F00_index_5F00_speed.p:320:240]
I would guess that something like this could be faster
lHasToBeQuoted = (LENGTH(inputString) > LENGTH(TRIM(inputString, ' ~"()<>@,;:\/[]?='))).
Case 1 can be improved by putting ParamValue[paramloop] in a non-extent character and using that in the whens instead of having to get the extent for every when.
This does lose out slightly on the last two inputs since they both match on the first part of the when so only get the overhead of the intermediate assign.
[quote user="Lars Neumeier"]
I would guess that something like this could be faster
1
2
3
4
|
lHasToBeQuoted = ( LENGTH (inputString) > LENGTH(TRIM(inputString, ' ~"()<>@,;:\/[]?='))). |
[/quote]
That will fail with:
hello"there
Since trim works from outside in.
Peter,
One way to do it, without a loop. I know it's not the fastest way; but more fun :-)
startTime[ 6 ] = mtime.
define variable cReplace as character.
do outerLoop = 1 to outerMax.
assign cReplace = ParamValue[paramloop]
cReplace = replace(cReplace, ' ', '12')
cReplace = replace(cReplace, '~"', '12')
cReplace = replace(cReplace, '(', '12')
cReplace = replace(cReplace, ')', '12')
cReplace = replace(cReplace, '<', '12')
cReplace = replace(cReplace, '>', '12')
cReplace = replace(cReplace, '@', '12')
cReplace = replace(cReplace, ',', '12')
cReplace = replace(cReplace, ';', '12')
cReplace = replace(cReplace, '=', '12')
cReplace = replace(cReplace, ':', '12')
cReplace = replace(cReplace, '~\', '12')
cReplace = replace(cReplace, '/', '12')
cReplace = replace(cReplace, '[', '12')
cReplace = replace(cReplace, ']', '12')
cReplace = replace(cReplace, '?', '12')
lQuote = length(cReplace) <> length(ParamValue[paramloop]).
end.
endTime[ 6 ] = mtime.
And on one occasion it was faster than looping
"input","iters","case 1","case 2","loop 1","loop 2","mix"
"value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted",100000,435,427,4012,2091,3274,602
"isofafafanl",100000,510,511,1401,1967,2379,458 - last one is my replace option :-)
" -asf-wrw",100000,95,96,274,263,257,470
"value quoted and aother is that = is a value to be quoted",100000,96,96,874,266,1259,681
"utf-8",100000,502,511,732,1956,1225,441
"""valuequoted""",100000,126,124,272,383,261,490
" spaced""value",100000,98,97,274,262,261,496
"spaced value ",100000,98,98,973,266,1429,488
And I hate to break it to you, but have you measured how long the QUOTER function takes? Its faster than any of the checks on quoter requirement...
I do have one trick up my sleeve that will be faster, but that requires another custom convmap :-)
Are you on windows?
If yes you could use the .Net regex functions which are very fast.
agreed with Stefan on this one, clearly a waste of time... both compiler's and yours Peter ;)
I would go with the following one:
lQuote = (INDEX(ParamValue[paramloop], ' ') > 0 OR INDEX(ParamValue[paramloop], '~"') > 0 OR INDEX(ParamValue[paramloop], '(') > 0 OR INDEX(ParamValue[paramloop], ')') > 0 OR INDEX(ParamValue[paramloop], '<') > 0 OR INDEX(ParamValue[paramloop], '>') > 0 OR INDEX(ParamValue[paramloop], '@') > 0 OR INDEX(ParamValue[paramloop], ',') > 0 OR INDEX(ParamValue[paramloop], ';') > 0 OR INDEX(ParamValue[paramloop], '=') > 0 OR INDEX(ParamValue[paramloop], ':') > 0 OR INDEX(ParamValue[paramloop], '~\') > 0 OR INDEX(ParamValue[paramloop], '/') > 0 OR INDEX(ParamValue[paramloop], '[') > 0 OR INDEX(ParamValue[paramloop], ']') > 0 OR INDEX(ParamValue[paramloop], '?') > 0).
input | iters | case 1 | case 2 | loop 1 | loop 2 | mix | index |
value-quoted-and-aother-is-that-=-is-a-value-to-be-quoted | 100000 | 550 | 564 | 6730 | 3395 | 5039 | 501 |
isofafafanl | 100000 | 704 | 723 | 2411 | 3339 | 3700 | 705 |
-asf-wrw | 100000 | 176 | 187 | 470 | 454 | 439 | 173 |
value quoted and aother is that = is a value to be quoted | 100000 | 174 | 171 | 1473 | 470 | 2005 | 189 |
utf-8 | 100000 | 719 | 736 | 1243 | 3334 | 1909 | 664 |
"valuequoted" | 100000 | 204 | 235 | 468 | 639 | 437 | 218 |
spaced"value | 100000 | 187 | 189 | 470 | 439 | 440 | 171 |
spaced value | 100000 | 157 | 203 | 1628 | 439 | 2216 | 172 |
@Lars Neumeier:
> I would go with the following one
> ... OR INDEX(ParamValue[paramloop], '(') > 0 ...
In that case you can still shave off some more time by reducing repetative array indexing and comparisons this way:
t = ParamValue[paramloop].
lQuote = INDEX(t, ' ') + INDEX(t, '~"') + ... > 0
Lars Neumeier:
> I would guess that something like this could be faster
> lHasToBeQuoted = (LENGTH(inputString) > LENGTH(TRIM(inputString, ' ~"()<>@,;:\/[]?='))
Stefan Drissen:
> That will fail ... Since trim works from outside in
Reverse it. TRIM away all the characters that do NOT need quoting, and see if the result is a non-empty string. Then there are characters that need quoting.