What's the fastest way to figure out whether a given letter is upper-case or not in ABL?
I want to turn something like OpenEdge.Net.DataObject.DataObjectHandler into a string like
OE.N.DO.DataObjectHandler
It has to be as fast as possible so , um, creative code will be considered .
I'd expect something like a range check of
ASC(charvar) between ASC("A") to ASC("Z").
Can you try
System.Text.RegularExpressions.Regex:Replace(mySubstring, "[^A-Z]", "").
DEF VAR xxx AS CHAR CASE-SENSITIVE.
ASSIGN xxx = "Test".
IF xxx = LC(xxx)
THEN DISP "only lower".
ELSE DISP "Upper case detected".
DEFINE VARIABLE cString AS CHARACTER NO-UNDO INITIAL "OpenEdge.Net.DataObject.DataObjectHandler".
DEFINE VARIABLE cChar AS CHARACTER NO-UNDO.
DEFINE VARIABLE cFinished AS CHARACTER NO-UNDO.
DEFINE VARIABLE iChar AS INTEGER NO-UNDO.
DEFINE VARIABLE iAsc AS INTEGER NO-UNDO.
DO iChar = 1 TO LENGTH(cString):
ASSIGN cChar = SUBSTRING(cString,iChar,1)
iAsc = ASC(cChar).
IF (iAsc GT 64 AND iAsc LT 91) OR
cChar EQ "." THEN
cFinished = cFinished + cChar.
END.
MESSAGE cFinished
VIEW-AS ALERT-BOX INFO BUTTONS OK.
Ken -
For DO ... TO structures like this:
DO iChar = 1 TO LENGTH(cString):
the LENGHT() function would computed on every iteration of the DO loop - you need to put the value in a variable and use that instead.
I've ended up using ASC(<char>, '1252') < 90. It's decently fast (less than half a millisecond for that test string.
I don't check a lower bound for the ASC values because I expect mostly class/type names in the string, and they can legally have A-Z, 0-9 and #, $, %, and _ characters.
- this takes about 0.052 millseconds per iteration
- Using a CASE ASC() is slower (0.088ms per)
- using a case-senstive variable is almost as fast (~0.058ms per)
- using COMPARE is a little slower (~0.060 ms per)
- I can't use .NET 'cos this needs to run on Unix too.
Test code
define variable numKeep as integer no-undo. define variable numChars as integer no-undo. define variable charLoop as integer no-undo. define variable numEntries as integer no-undo. define variable entryLoop as integer no-undo. define variable singleEntry as character no-undo. define variable singleChar as character no-undo. define variable charAsc as integer no-undo. define variable delim as character no-undo. define variable typeName as character no-undo. define variable startTime as integer no-undo. def var tokenValue as char. def var tokenArg as char. def var i as int. def var i2 as dec. startTime = mtime. assign typeName = 'OpenEdge.Net.DataObject.DataObjectHandler' numEntries = num-entries (typeName, '.':u) delim = '':u tokenValue = '':u. do i = 1 to 1000000: tokenvalue = ''. delim = ''. do entryLoop = 1 to numEntries: assign singleEntry = entry(entryLoop, typeName, '.':u). /* Loop through entire input string */ assign numChars = length(singleEntry, 'raw':u) tokenValue = tokenValue + delim . do charLoop = 1 to numChars: assign /* ASCII value of character using single byte codepage */ singleChar = substring(singleEntry, charLoop, 1, 'RAW':u) charAsc = asc(singleChar, '1252':u).
if charAsc le 90 /* Z */
or charAsc eq 95 /* _ */ then
assign tokenValue = tokenValue + singleChar.
end. delim = '.'. end. end. i2 = mtime - startTime. message tokenvalue skip i2 skip i2 / (i - 1) view-as alert-box.
[quote user="Tim Kuehn"]
I'd expect something like a range check of
ASC(charvar) between ASC("A") to ASC("Z").
[/quote]
This will not help you in Europe at all!
Well pardon my anglocentrisim. :)
Is code written using an extra-ASCII character set?
This may, or may not work with non-English, non-ASCII characters but I kind of like it anyway:
define variable c as character no-undo format "x". do while true: update c. display ( chr( asc( c ) + 32 ) = c ). end.
You do realize that your test code does not meet your requirements? The last entry should be added completely.
do entryLoop = 1 to numEntries:
assign singleEntry = entry(entryLoop, typeName, '.':u).
if entryLoop < numEntries then do:
/* Loop through entire input string */
...
end.
else
tokenvalue = tokenValue + delim + singleEntry.
And obviously the fastest way is to take the work out of the ABL and put it into the AVM and create a custom codepage and then use the codepage-convert function.
You are also cheating by taking the num-entries count outside of the loop - the 'function' requires the count. Here's my version with a quickly cobbled together 'upperonly' codepage. I first tried mapping all non-uppers to 000 - but that just terminated the string, so I put 032 in and replace that afterwards. It is still way faster (on my i7 your code, adjusted to not convert last entry, is taking 0.2305 per iteration, codepage-convert is taking 0.00148 per iteration):
def var typeName as char no-undo. def var tokenValue as char no-undo. def var numEntries as int no-undo. def var startTime as integer no-undo. def var ii as int no-undo. def var i2 as int no-undo. startTime = mtime. typeName = 'OpenEdge.Net.DataObject.DataObjectHandler'. do ii = 1 to 1000000: assign tokenvalue = typeName numEntries = num-entries( tokenvalue, "." ) . entry( numEntries, tokenvalue, "." ) = "". tokenValue = replace( codepage-convert( tokenvalue, "upperonly" ), " ", "" ) + entry( numEntries, typeName, "." ). end. i2 = mtime - startTime. message tokenvalue skip i2 skip i2 / ii view-as alert-box.
#--------------------------------------------------------------------------- # This contains the data needed to convert from # iso8859-1 to upper-only encoding. # # This is NOT a full transformation. # # CONVERT NOINVERSE SOURCE-NAME "ISO8859-1" TARGET-NAME "UPPERONLY" TYPE "1" /*032-015*/ 032 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 /*016-031*/ 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 /*032-047*/ 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 /*048-063*/ 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 /*064-079*/ 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 /*080-095*/ 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 /*096-111*/ 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 /*112-127*/ 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 /*128-143*/ 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 /*144-159*/ 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 /*160-175*/ 160 161 162 163 164 165 166 064 168 169 170 171 172 173 174 175 /*176-191*/ 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 /*192-207*/ 192 193 194 195 091 197 198 199 200 201 202 203 204 205 206 207 /*208-223*/ 208 209 210 211 212 213 092 215 216 217 218 219 093 221 222 126 /*224-239*/ 224 225 226 227 123 229 230 231 232 233 234 235 236 237 238 239 /*240-255*/ 240 241 242 243 244 245 124 247 248 249 250 251 125 253 254 255 ENDTABLE ENDCONVERT
Then add the reference to the upperonly.dat file to convmap.dat at the end:
# Testing INCLUDE INCLUDE-FILE upperonly.dat
And then compile the codepage with:
proutil -C codepage-compiler convmap.dat convmap2.cp
Start AM with -convmap convmap2.cp
This may be too simplistic, but can you user the LC and/or CAPS functions to test it.
if vChar <> LC(vChar) then /* it's uppercase */
I would hope that would take things like code pages, locality, and language into account.
Unfortunately LC() does not work in systems set to case-insensitive. Give it a try, a
if LC("A") = "A" then message "doesn't work".
will show the error.
When insensitive, it looks like the "if" in C (the base language of the ABL?) is something like:
char *convertToUpperCase(char *sPtr)
{
while(*sPtr != '\0')
*sPtr = toupper((unsigned char)*sPtr);
return sPtr;
}
// Very rudimentary as the parser will be more difficult, just pointing out the VM does this, while the your source does the above
Result = if (strcmp(convertToUpperCase(LeftSideLogicalOperator), convertToUpperCase(RightSideLogicalOperator) ==0) then
1 // Equal
else
0; // not equal
IOW, the underlying code will convert to a uniform case before doing the compare since it is case-insensitive.
@scott_auge
Perhaps, you would need to make the variable case-sensitive.
define variable vChar as character case-sensitive no-undo.
vChar = "A".
/* Test if string is lowercase */
if LC(vChar) = vChar then
message "Equal".
else
message "Not equal".
Peter, I have noticed that you used for every literal the ":u" attribute. Has this attribute a huge performance impact?
@Creative
DEFINE VARIABLE cClassName AS CHARACTER NO-UNDO CASE-SENSITIVE. DEFINE VARIABLE cBeforeTrim AS CHARACTER NO-UNDO CASE-SENSITIVE. DEFINE VARIABLE cAfterTrim AS CHARACTER NO-UNDO CASE-SENSITIVE. DEFINE VARIABLE cResult AS CHARACTER NO-UNDO. DEFINE VARIABLE iLength AS INTEGER NO-UNDO. DEFINE VARIABLE iCount AS INTEGER NO-UNDO. ASSIGN cClassName = "OpenEdge.Net.DataObject.DataObjectHandler" iLength = LENGTH(cClassName)
iCount = 1 . DO WHILE TRUE: IF iCount >= iLength THEN LEAVE. ASSIGN cBeforeTrim = SUBSTRING(cClassName, iCount) cAfterTrim = LEFT-TRIM(cBeforeTrim, "abcdefghijklmnopqrstuvwxyz") iCount = iCount + LENGTH(cBeforeTrim) - LENGTH(cAfterTrim) + 1 cResult = cResult + SUBSTRING(cAfterTrim, 1, 1) . END.
/* cResult = OE.N.DO.DOH - ASSIGN Statement: 12times called 0.000075 avg. per call 0.000006 */
:U marks the string as not-translatable - this has the compile time-effect of enabling the compiler to concatenate a series of strings together. Without :U, the string is considered translatable which means it can't be concatenated at compile time.
"string1":U + "string2":U <- concatenated once at compile time
"string1" + "string2" <- concatenated at run time each time this line of code is executed
|