Use regular expressions to replace multiple instances in a s

Posted by Johan Vergeer on 02-Feb-2017 04:46

I would like to use a regular expression in OpenEdge to replace parts of a string.

In this case we want to replace SQL expressions with a symbol (now we use ?).

So what I have so far is this:

    USING Progress.Lang.*.
    USING System.Text.RegularExpressions.*.

    BLOCK-LEVEL ON ERROR UNDO, THROW.

    CLASS TOOLS.externals.TextSanitizer:

        CONSTRUCTOR PUBLIC TextSanitizer ( ):
            SUPER ().
        END CONSTRUCTOR.

        METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):

            DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.
            DEFINE VARIABLE regexp AS Regex NO-UNDO.

            regexp = NEW Regex("^((INSERT)\s+(INTO).*((VALUES)|(SELECT))|(SELECT).*(FROM)|(UPDATE).*(SET)|(DELETE)\s+(FROM)|((CREATE)|(DROP)|(TRUNCATE)|(ALTER)|(MODIFY))\s+((INDEX)|(TABLE)|(DATABASE)|(COLUMN)|(VIEW)))$").

            RESULT = regexp:Replace(pInputText, "?").

            DELETE OBJECT regexp.

            RETURN RESULT.

        END METHOD.

    END CLASS.

What I have tried to get mixed casing work:

    regexp = NEW Regex("/^rest of the regex$/ig").

And I wrote a procedure to run some tests:

    BLOCK-LEVEL ON ERROR UNDO, THROW.

    USING TOOLS.externals.TextSanitizer.

    DEFINE VARIABLE textSanitizer AS TextSanitizer NO-UNDO.

    textSanitizer = NEW TextSanitizer().
    MESSAGE textSanitizer:SanitizeSql("SELECT * FROM") VIEW-AS ALERT-BOX.

    RETURN.


The tests I have run so far:

  •  `SELECT * FROM` --> This returns `?` as expected
  •  `select * from` --> This returns the original string, so mixed casing does not work, not even when I added `/i`
  •  `SELECT * FROM SELCT * FROM` --> This also returned `?`, while I would expect `? SELCT * FROM`
  •  `SELECT * FROM SELECT * FROM` --> Returns `?`, what I would expect is `? ?`

Could someone help me out here?

Posted by Johan Vergeer on 02-Feb-2017 05:45

It looks like I got it to work:

METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):

DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.

DEFINE VARIABLE regexp AS Regex NO-UNDO.

       regexp = NEW Regex("(?i)(INSERT\s+INTO.*?VALUES|SELECT.*?FROM|UPDATE.*?SET|DELETE\s+FROM|(CREATE|DROP|TRUNCATE|ALTER|MODIFY)\s+(INDEX|TABLE|DATABASE|COLUMN|VIEW))").

       DO WHILE regexp:IsMatch(pInputText) = TRUE:

           pInputText = regexp:Replace(pInputText, "%%").

       END.

       RESULT = pInputText.

       DELETE OBJECT regexp.

RETURN RESULT.

END METHOD.

So what I did here is check if the regex can find a match, and as long as it can, keep replacing.

All Replies

Posted by David Abdala on 02-Feb-2017 05:00

Can't really help you, but just explain the results:

Regular expressions search for the LONGEST match, not the shortest.

In both "unexpected" cases, you have two FROM, and between (SELECT) and (FROM) you have .*, then the longest match for the .* is everything between the first SELECT and the last FROM.

How to ignore case? No idea.

how to make it pick the shortest match? No idea.

I prefer string handling to regular expressions, except when the regular expression is really simple (maybe one or two times in the last 20 years).

I don't know what exactly are yo doing, but we implemented an SQL parser class (that makes replacements, transformations to ABL, etc) where the first step is to split the SQL into a temp-table, using mostly ENTRY.

Posted by Johan Vergeer on 02-Feb-2017 05:45

It looks like I got it to work:

METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):

DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.

DEFINE VARIABLE regexp AS Regex NO-UNDO.

       regexp = NEW Regex("(?i)(INSERT\s+INTO.*?VALUES|SELECT.*?FROM|UPDATE.*?SET|DELETE\s+FROM|(CREATE|DROP|TRUNCATE|ALTER|MODIFY)\s+(INDEX|TABLE|DATABASE|COLUMN|VIEW))").

       DO WHILE regexp:IsMatch(pInputText) = TRUE:

           pInputText = regexp:Replace(pInputText, "%%").

       END.

       RESULT = pInputText.

       DELETE OBJECT regexp.

RETURN RESULT.

END METHOD.

So what I did here is check if the regex can find a match, and as long as it can, keep replacing.

This thread is closed