Use regular expressions to replace multiple instances in a s

Posted by Johan Vergeer on 02-Feb-2017 04:46

I would like to use a regular expression in OpenEdge to replace parts of a string.

In this case we want to replace SQL expressions with a symbol (now we use ?).

So what I have so far is this:

USING Progress.Lang.*.
USING System.Text.RegularExpressions.*.

BLOCK-LEVEL ON ERROR UNDO, THROW.

CLASS TOOLS.externals.TextSanitizer:

CONSTRUCTOR PUBLIC TextSanitizer ( ):
SUPER ().
END CONSTRUCTOR.

METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):

DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.
DEFINE VARIABLE regexp AS Regex NO-UNDO.

regexp = NEW Regex("^((INSERT)\s+(INTO).*((VALUES)|(SELECT))|(SELECT).*(FROM)|(UPDATE).*(SET)|(DELETE)\s+(FROM)|((CREATE)|(DROP)|(TRUNCATE)|(ALTER)|(MODIFY))\s+((INDEX)|(TABLE)|(DATABASE)|(COLUMN)|(VIEW)))$").

RESULT = regexp:Replace(pInputText, "?").

DELETE OBJECT regexp.

RETURN RESULT.

END METHOD.

END CLASS.

What I have tried to get mixed casing work:

regexp = NEW Regex("/^rest of the regex$/ig").

And I wrote a procedure to run some tests:

BLOCK-LEVEL ON ERROR UNDO, THROW.

USING TOOLS.externals.TextSanitizer.

DEFINE VARIABLE textSanitizer AS TextSanitizer NO-UNDO.

textSanitizer = NEW TextSanitizer().
MESSAGE textSanitizer:SanitizeSql("SELECT * FROM") VIEW-AS ALERT-BOX.

RETURN.

The tests I have run so far:

`SELECT * FROM` --> This returns `?` as expected
`select * from` --> This returns the original string, so mixed casing does not work, not even when I added `/i`
`SELECT * FROM SELCT * FROM` --> This also returned `?`, while I would expect `? SELCT * FROM`
`SELECT * FROM SELECT * FROM` --> Returns `?`, what I would expect is `? ?`

Could someone help me out here?

OpenEdge Development - Forum

Posted by Johan Vergeer on 02-Feb-2017 05:45

It looks like I got it to work:

METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):

DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.

DEFINE VARIABLE regexp AS Regex NO-UNDO.

DO WHILE regexp:IsMatch(pInputText) = TRUE:

pInputText = regexp:Replace(pInputText, "%%").

END.

RESULT = pInputText.

DELETE OBJECT regexp.

RETURN RESULT.

END METHOD.

So what I did here is check if the regex can find a match, and as long as it can, keep replacing.

All Replies

Posted by David Abdala on 02-Feb-2017 05:00

Can't really help you, but just explain the results:

Regular expressions search for the LONGEST match, not the shortest.

In both "unexpected" cases, you have two FROM, and between (SELECT) and (FROM) you have .*, then the longest match for the .* is everything between the first SELECT and the last FROM.

How to ignore case? No idea.

how to make it pick the shortest match? No idea.

I prefer string handling to regular expressions, except when the regular expression is really simple (maybe one or two times in the last 20 years).

I don't know what exactly are yo doing, but we implemented an SQL parser class (that makes replacements, transformations to ABL, etc) where the first step is to split the SQL into a temp-table, using mostly ENTRY.