Fastest way to figure out whether a letter is upper case

Posted by Peter Judge on 06-Jan-2017 13:11

What's the fastest way to figure out whether a given letter is upper-case or not in ABL?

 I want to turn something like OpenEdge.Net.DataObject.DataObjectHandler  into a string like

              OE.N.DO.DataObjectHandler    

It has to be as fast as possible so , um, creative code will be considered .

All Replies

Posted by Tim Kuehn on 06-Jan-2017 13:14

I'd expect something like a range check of

ASC(charvar) between ASC("A") to ASC("Z").

Posted by Brian K. Maher on 06-Jan-2017 13:16

Use the COMPARE function
 

Posted by jquerijero on 06-Jan-2017 13:17

Can you try

System.Text.RegularExpressions.Regex:Replace(mySubstring, "[^A-Z]", "").

Posted by Adriano Correa on 06-Jan-2017 13:42

DEF VAR xxx AS CHAR CASE-SENSITIVE.

ASSIGN xxx = "Test".

IF xxx = LC(xxx)

   THEN DISP "only lower".

   ELSE DISP "Upper case detected".

Posted by Ken McIntosh on 06-Jan-2017 13:50

DEFINE VARIABLE cString   AS CHARACTER   NO-UNDO INITIAL "OpenEdge.Net.DataObject.DataObjectHandler".

DEFINE VARIABLE cChar     AS CHARACTER   NO-UNDO.

DEFINE VARIABLE cFinished AS CHARACTER   NO-UNDO.

DEFINE VARIABLE iChar AS INTEGER     NO-UNDO.

DEFINE VARIABLE iAsc  AS INTEGER     NO-UNDO.

DO iChar = 1 TO LENGTH(cString):

   ASSIGN cChar = SUBSTRING(cString,iChar,1)

          iAsc = ASC(cChar).

   IF (iAsc GT 64 AND iAsc LT 91) OR

      cChar EQ "." THEN

       cFinished = cFinished + cChar.

END.

MESSAGE cFinished

   VIEW-AS ALERT-BOX INFO BUTTONS OK.

Posted by Tim Kuehn on 06-Jan-2017 13:53

Ken -

For DO ... TO structures like this:

DO iChar = 1 TO LENGTH(cString):

the LENGHT() function would computed on every iteration of the DO loop - you need to put the value in a variable and use that instead.

Posted by Peter Judge on 06-Jan-2017 13:58

I've ended up using ASC(<char>, '1252') < 90. It's decently fast (less than half a millisecond for that test string.

I don't check a lower bound for the ASC values because I expect mostly class/type names in the string, and they can legally have A-Z, 0-9 and #, $, %, and _ characters.

- this takes about 0.052 millseconds per iteration

- Using a CASE ASC() is slower (0.088ms per)

- using a case-senstive variable is almost as fast  (~0.058ms per)

- using COMPARE is a little slower (~0.060 ms per)

- I can't use .NET 'cos this needs to run on Unix too.

Test code

define variable numKeep as integer no-undo.
define variable numChars as integer no-undo.
define variable charLoop as integer no-undo.
define variable numEntries as integer no-undo.
define variable entryLoop as integer no-undo.
define variable singleEntry as character no-undo.
define variable singleChar as character no-undo.
define variable charAsc as integer no-undo.
define variable delim as character no-undo.
define variable typeName as character no-undo.
define variable startTime as integer no-undo.

def var tokenValue as char.  
def var tokenArg as char.
def var i as int.
def var i2 as dec.

startTime = mtime.

assign typeName   = 'OpenEdge.Net.DataObject.DataObjectHandler'
       numEntries = num-entries (typeName, '.':u)
       delim      = '':u
       tokenValue = '':u.
do i = 1 to 1000000:
    tokenvalue = ''.
    delim = ''.
do entryLoop = 1 to numEntries:
    assign singleEntry = entry(entryLoop, typeName, '.':u).

    /* Loop through entire input string */
    assign numChars   = length(singleEntry, 'raw':u)
           tokenValue = tokenValue + delim
           .
    do charLoop = 1 to numChars:
        assign /* ASCII value of character using single byte codepage */
               singleChar = substring(singleEntry, charLoop, 1, 'RAW':u)
               charAsc    = asc(singleChar, '1252':u).

if charAsc le 90 /* Z */
or charAsc eq 95 /* _ */ then
assign tokenValue = tokenValue + singleChar.
end. delim = '.'. end. end. i2 = mtime - startTime. message tokenvalue skip i2 skip i2 / (i - 1) view-as alert-box.

Posted by Mike Fechner on 06-Jan-2017 14:09

[quote user="Tim Kuehn"]

I'd expect something like a range check of

ASC(charvar) between ASC("A") to ASC("Z").

[/quote]

This will not help you in Europe at all!

Posted by Tim Kuehn on 06-Jan-2017 14:25

Well pardon my anglocentrisim. :)

Is code written using an extra-ASCII character set?

Posted by ChUIMonster on 06-Jan-2017 15:02

This may, or may not work with non-English, non-ASCII characters but I kind of like it anyway:

define variable c as character no-undo format "x".

do while true:

  update c.

  display ( chr( asc( c ) + 32 ) = c ).

end.

Posted by Stefan Drissen on 06-Jan-2017 16:36

You do realize that your test code does not meet your requirements? The last entry should be added completely.

do entryLoop = 1 to numEntries:

   assign singleEntry = entry(entryLoop, typeName, '.':u).

   if entryLoop < numEntries then do:

       /* Loop through entire input string */

...

   end.

   else

      tokenvalue = tokenValue + delim + singleEntry.

And obviously the fastest way is to take the work out of the ABL and put it into the AVM and create a custom codepage and then use the codepage-convert function.

Posted by Stefan Drissen on 06-Jan-2017 17:04

You are also cheating by taking the num-entries count outside of the loop - the 'function' requires the count. Here's my version with a quickly cobbled together 'upperonly' codepage. I first tried mapping all non-uppers to 000 - but that just terminated the string, so I put 032 in and replace that afterwards. It is still way faster (on my i7 your code, adjusted to not convert last entry, is taking 0.2305 per iteration, codepage-convert is taking 0.00148 per iteration):

def var typeName as char no-undo.
def var tokenValue as char no-undo. 
def var numEntries as int no-undo.

def var startTime as integer no-undo.
def var ii as int no-undo.
def var i2 as int no-undo.


startTime = mtime.

typeName = 'OpenEdge.Net.DataObject.DataObjectHandler'.

do ii = 1 to 1000000:
   
   assign
      tokenvalue = typeName
      numEntries = num-entries( tokenvalue, "." )
      .   
   entry( numEntries, tokenvalue, "." ) = "".
   tokenValue = replace( codepage-convert( tokenvalue, "upperonly" ), " ", "" )
              + entry( numEntries, typeName, "." ).
end.

i2 = mtime - startTime.

message
   tokenvalue skip
   i2 skip
   i2 / ii
view-as alert-box.

#---------------------------------------------------------------------------
# This contains the data needed to convert from
# iso8859-1 to upper-only encoding.
#
# This is NOT a full transformation.
#
#
CONVERT NOINVERSE
SOURCE-NAME "ISO8859-1"
TARGET-NAME "UPPERONLY"
TYPE "1"
  /*032-015*/  032 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015
  /*016-031*/  016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031
  /*032-047*/  032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047
  /*048-063*/  048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063
  /*064-079*/  064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079
  /*080-095*/  080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095
  /*096-111*/  032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032
  /*112-127*/  032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032
  /*128-143*/  032 032 032 032 032 032 032 032 032 032 032 032 032 032 032 032
  /*144-159*/  144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
  /*160-175*/  160 161 162 163 164 165 166 064 168 169 170 171 172 173 174 175
  /*176-191*/  176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
  /*192-207*/  192 193 194 195 091 197 198 199 200 201 202 203 204 205 206 207
  /*208-223*/  208 209 210 211 212 213 092 215 216 217 218 219 093 221 222 126
  /*224-239*/  224 225 226 227 123 229 230 231 232 233 234 235 236 237 238 239
  /*240-255*/  240 241 242 243 244 245 124 247 248 249 250 251 125 253 254 255
ENDTABLE
ENDCONVERT

Then add the reference to the upperonly.dat file to convmap.dat at the end:

# Testing
INCLUDE
INCLUDE-FILE upperonly.dat

And then compile the codepage with:

proutil -C codepage-compiler convmap.dat convmap2.cp

Start AM with -convmap convmap2.cp


Posted by Peter Judge on 09-Jan-2017 10:10

I don’t have control over the code pages used, so sadly I cannot use your approach.
 
I was also chatting to our resident codepage expert (Hi Garry!) who reckons that while ASC() is fastest, a case-sensitive variable is most robust (since it’ll work on other codepages too - UTF-8 in particular).
 
My test for using a case-sensitive variable runs in around 0.059ms per iteration which is good enough, I think/hope.
 
Well-spotted on the last entry :) In my real code the numEntries – numKeep number of entries are kept as-is (per below in case you care).
 
define variable numKeep as integer no-undo.
                    define variable numChars as integer no-undo.
                    define variable charLoop as integer no-undo.
                    define variable numEntries as integer no-undo.
                    define variable entryLoop as integer no-undo.
                    define variable singleEntry as character no-undo.
                    define variable newEntry as character no-undo.
                    define variable sensitiveChar as character no-undo case-sensitive.
                    define variable charAsc as integer no-undo.
                    define variable delim as character no-undo.
                    define variable typeName as character no-undo.
                   
                    assign typeName   = trim(entry(1, tokenArg, ':':u))
                           numEntries = num-entries (typeName, '.':u)
                           delim      = '':u
                           tokenValue = '':u.
                    if num-entries(tokenArg, ':':u) gt 1 then
                        assign formatString = trim(entry(2, tokenArg, ':':u)).
                   
                    if formatString eq '':u then
                        assign formatString = '1K':u.
                   
                    assign numKeep = ?
                           numKeep = integer(substring(formatString, 1, 1))
                           no-error.
                    if numKeep eq ? then
                        assign numKeep = 1.
                   
                    do entryLoop = 1 to numEntries:
                        assign singleEntry = entry(entryLoop, typeName, '.':u).
                       
                        if entryLoop gt numEntries - numKeep then
                            assign tokenValue = tokenValue
                                              + delim
                                             + singleEntry.
                        else
                        case substring(formatString, 2, 1):
                            when 'U':u then
                                assign tokenValue = tokenValue
                                                  + delim
                                                  + caps(substring(singleEntry, 1, 1)).
                           
                            when 'L':u then
                                assign tokenValue = tokenValue
                                                  + delim
                                                  + lc(substring(singleEntry, 1, 1)).
                           
                            when 'C':u then
                            do:                               
                                /* Loop through entire input string */
                                assign numChars = length(singleEntry)
                                       newEntry = '':u.
                                do charLoop = 1 to numChars:
                                    assign sensitiveChar = substring(singleEntry, charLoop, 1).
                                    if sensitiveChar eq caps(sensitiveChar) then
                                        assign newEntry = newEntry + sensitiveChar.
                                end.
                                // if there are no CAPS then use the first char, as-is
                                if newEntry eq '':u then
                                    assign newEntry = substring(singleEntry, 1, 1).
                               
                                assign tokenValue = tokenValue
                                                  + delim
                                                  + newEntry.
                            end.    // CamelCase
                           
                            otherwise   // K
                                assign tokenValue = tokenValue
                                                  + delim
                                                  + substring(singleEntry, 1, 1).
                        end case.
                        assign delim = '.':u.
                    end.
 
 

Posted by Ken Ward on 09-Jan-2017 10:29

This may be too simplistic, but can you user the LC and/or CAPS functions to test it.

if vChar <> LC(vChar) then /* it's uppercase */

I would hope that would take things like code pages, locality, and language into account.

Posted by scott_auge on 09-Jan-2017 12:30

Unfortunately LC() does not work in systems set to case-insensitive.  Give it a try, a

if LC("A") = "A" then message "doesn't work".

will show the error.

When insensitive, it looks like the "if" in C (the base language of the ABL?) is something like:

   char *convertToUpperCase(char *sPtr)

   {

     while(*sPtr != '\0')

       *sPtr = toupper((unsigned char)*sPtr);

      return sPtr;

   }

// Very rudimentary as the parser will be more difficult, just pointing out the VM does this, while the your source does the above

Result = if (strcmp(convertToUpperCase(LeftSideLogicalOperator), convertToUpperCase(RightSideLogicalOperator) ==0) then

1 // Equal

else

0;  // not equal

IOW, the underlying code will convert to a uniform case before doing the compare since it is case-insensitive.

Posted by egarcia on 09-Jan-2017 12:59

@scott_auge

Perhaps, you would need to make the variable case-sensitive.

define variable vChar as character case-sensitive no-undo.

vChar = "A".

/* Test if string is lowercase */

if LC(vChar) = vChar then

   message "Equal".

else

   message "Not equal".

Posted by Lars Neumeier on 11-Jan-2017 15:46

Peter, I have noticed that you used for every literal the ":u" attribute. Has this attribute a huge performance impact?

@Creative

DEFINE VARIABLE cClassName  AS CHARACTER NO-UNDO CASE-SENSITIVE.
DEFINE VARIABLE cBeforeTrim AS CHARACTER NO-UNDO CASE-SENSITIVE.
DEFINE VARIABLE cAfterTrim  AS CHARACTER NO-UNDO CASE-SENSITIVE.
DEFINE VARIABLE cResult     AS CHARACTER NO-UNDO.

DEFINE VARIABLE iLength AS INTEGER NO-UNDO.
DEFINE VARIABLE iCount  AS INTEGER NO-UNDO.

ASSIGN
  cClassName = "OpenEdge.Net.DataObject.DataObjectHandler"
  iLength    = LENGTH(cClassName)
iCount     = 1 . DO WHILE TRUE: IF iCount >= iLength THEN LEAVE. ASSIGN cBeforeTrim = SUBSTRING(cClassName, iCount) cAfterTrim = LEFT-TRIM(cBeforeTrim, "abcdefghijklmnopqrstuvwxyz") iCount = iCount + LENGTH(cBeforeTrim) - LENGTH(cAfterTrim) + 1 cResult = cResult + SUBSTRING(cAfterTrim, 1, 1) . END.
/* cResult = OE.N.DO.DOH - ASSIGN Statement: 12times called 0.000075 avg. per call 0.000006 */

Posted by Mike Fechner on 11-Jan-2017 23:30

I would hope that :U only makes difference at compile time. At runtime that _has_ to be transparent.

Posted by Tim Kuehn on 11-Jan-2017 23:38

:U marks the string as not-translatable - this has the compile time-effect of enabling the compiler to concatenate a series of strings together. Without :U, the string is considered translatable which means it can't be concatenated at compile time.

"string1":U + "string2":U <- concatenated once at compile time

"string1"     + "string2"    <- concatenated at run time each time this line of code is executed

Posted by Mike Fechner on 11-Jan-2017 23:48

That aside, in this thread there are no concatenated strings. But I guess we agree that :U should have no runtime impact on the code samples discussed in this thread.
 
Von: Tim Kuehn [mailto:bounce-timk519@community.progress.com]
Gesendet: Donnerstag, 12. Januar 2017 06:40
An: TU.OE.Development@community.progress.com
Betreff: RE: [Technical Users - OE Development] Fastest way to figure out whether a letter is upper case
 
Update from Progress Community
 

:U marks the string as not-translatable - this has the compile time-effect of enabling the compiler to concatenate a series of strings together. Without :U, the string is considered translatable which means it can't be concatenated at compile time.

"string1":U + "string2":U <- concatenated once at compile time

"string1"     + "string2"    <- concatenated at run time each time this line of code is executed

View online

 

You received this notification because you subscribed to the forum.  To unsubscribe from only this thread, go here.

Flag this post as spam/abuse.

 

This thread is closed