I would like to use a regular expression in OpenEdge to replace parts of a string.
In this case we want to replace SQL expressions with a symbol (now we use ?).
So what I have so far is this:
USING Progress.Lang.*.
USING System.Text.RegularExpressions.*.
BLOCK-LEVEL ON ERROR UNDO, THROW.
CLASS TOOLS.externals.TextSanitizer:
CONSTRUCTOR PUBLIC TextSanitizer ( ):
SUPER ().
END CONSTRUCTOR.
METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):
DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.
DEFINE VARIABLE regexp AS Regex NO-UNDO.
regexp = NEW Regex("^((INSERT)\s+(INTO).*((VALUES)|(SELECT))|(SELECT).*(FROM)|(UPDATE).*(SET)|(DELETE)\s+(FROM)|((CREATE)|(DROP)|(TRUNCATE)|(ALTER)|(MODIFY))\s+((INDEX)|(TABLE)|(DATABASE)|(COLUMN)|(VIEW)))$").
RESULT = regexp:Replace(pInputText, "?").
DELETE OBJECT regexp.
RETURN RESULT.
END METHOD.
END CLASS.
What I have tried to get mixed casing work:
regexp = NEW Regex("/^rest of the regex$/ig").
And I wrote a procedure to run some tests:
BLOCK-LEVEL ON ERROR UNDO, THROW.
USING TOOLS.externals.TextSanitizer.
DEFINE VARIABLE textSanitizer AS TextSanitizer NO-UNDO.
textSanitizer = NEW TextSanitizer().
MESSAGE textSanitizer:SanitizeSql("SELECT * FROM") VIEW-AS ALERT-BOX.
RETURN.
The tests I have run so far:
Could someone help me out here?
It looks like I got it to work:
METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):
DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.
DEFINE VARIABLE regexp AS Regex NO-UNDO.
regexp = NEW Regex("(?i)(INSERT\s+INTO.*?VALUES|SELECT.*?FROM|UPDATE.*?SET|DELETE\s+FROM|(CREATE|DROP|TRUNCATE|ALTER|MODIFY)\s+(INDEX|TABLE|DATABASE|COLUMN|VIEW))").
DO WHILE regexp:IsMatch(pInputText) = TRUE:
pInputText = regexp:Replace(pInputText, "%%").
END.
RESULT = pInputText.
DELETE OBJECT regexp.
RETURN RESULT.
END METHOD.
So what I did here is check if the regex can find a match, and as long as it can, keep replacing.
Can't really help you, but just explain the results:
Regular expressions search for the LONGEST match, not the shortest.
In both "unexpected" cases, you have two FROM, and between (SELECT) and (FROM) you have .*, then the longest match for the .* is everything between the first SELECT and the last FROM.
How to ignore case? No idea.
how to make it pick the shortest match? No idea.
I prefer string handling to regular expressions, except when the regular expression is really simple (maybe one or two times in the last 20 years).
I don't know what exactly are yo doing, but we implemented an SQL parser class (that makes replacements, transformations to ABL, etc) where the first step is to split the SQL into a temp-table, using mostly ENTRY.
It looks like I got it to work:
METHOD PUBLIC CHARACTER SanitizeSql( INPUT pInputText AS CHARACTER ):
DEFINE VARIABLE RESULT AS CHARACTER NO-UNDO.
DEFINE VARIABLE regexp AS Regex NO-UNDO.
regexp = NEW Regex("(?i)(INSERT\s+INTO.*?VALUES|SELECT.*?FROM|UPDATE.*?SET|DELETE\s+FROM|(CREATE|DROP|TRUNCATE|ALTER|MODIFY)\s+(INDEX|TABLE|DATABASE|COLUMN|VIEW))").
DO WHILE regexp:IsMatch(pInputText) = TRUE:
pInputText = regexp:Replace(pInputText, "%%").
END.
RESULT = pInputText.
DELETE OBJECT regexp.
RETURN RESULT.
END METHOD.
So what I did here is check if the regex can find a match, and as long as it can, keep replacing.