OpenEdge 11.1
Given the following scenario: OpenEdge GUI for .NET Client and AppServer, CPINTERNAL utf-8, Database code page also utf-8.
In parallel TTY clients on UNIX running single byte code pages.
In areas of the application that are relevant for compability with the legacy TTY application we need to ensure (validate) that character data entered on some of the GUI screens does not cause code page issues (and if it's just that characters are lost) in terminals or other output) in the legacy application.
So for some tables/fields we need to avoid, that users in the GUI for .NET Application can enter characters that do not fit into a TTY applications code page. As an example, I may need to avoid that certain text fields contain Turkish special characters, because the TTY application running iso8859-1 could not handle them.
What's the best way to validate this? I'm not keen parsing strings myself and validating ASC values of each character in question.
One solution would be to try to assign the CHARACTER in question to a LONGCHAR fixed to iso8859-1 and see if that throws an error.
ROUTINE-LEVEL ON ERROR UNDO, THROW. /* *************************** Main Block *************************** */ DEFINE VARIABLE cTest AS CHARACTER NO-UNDO. DEFINE VARIABLE lcTest AS LONGCHAR NO-UNDO. cTest = CHR (50591, "utf-8") . /*cTest = "ä" .*/ MESSAGE cTest SKIP ASC (cTest) SKIP VIEW-AS ALERT-BOX. FIX-CODEPAGE (lcTest) = "iso8859-1" . /*FIX-CODEPAGE (lcTest) = "1254" .*/ DO ON ERROR UNDO, THROW: /* Attempt to assign cTest to LONGCHAR fixed to iso8859-1 */ lcTest = cTest . CATCH err AS Progress.Lang.Error: IF err:GetMessageNum (1) = 142 THEN MESSAGE "Codepage problem...." VIEW-AS ALERT-BOX. ELSE UNDO, THROW err . END CATCH. END. MESSAGE STRING (lcTest) SKIP cTest ASC (cTest) VIEW-AS ALERT-BOX.
Are there other solutions? Solutions that work for a complete temp-table record or ProDataset at once?
My last attempt, I promise. Instead of WRITE-XML() to a LONGCHAR of the target codepage, write to a UTF-8 LONGCHAR, then COPY-LOB it to a LONGCHAR of the target codepage. e.g.
define temp-table tt1 no-undo
field f1 as char
index ix1 f1.
define temp-table tt2 no-undo
field f1 as char
field f2 as char
index ix2 f1 f2.
define dataset ds1 for tt1, tt2
data-relation dr1 for tt1, tt2
relation-fields(f1,f1).
DEFINE VARIABLE lcds AS LONGCHAR NO-UNDO.
DEFINE VARIABLE lcds2 AS LONGCHAR NO-UNDO.
DO transaction:
create tt1.
assign tt1.f1 = "A".
create tt2.
assign
tt2.f1 = tt1.f1
/* Turkish lowercase dotless i
* U+0131 => UTF-8 hex 0xc4b1 UTF-8 dec 50353 */
tt2.f2 = CHR(50353).
END.
fix-codepage(lcds) = "UTF-8".
dataset ds1:handle:write-xml(
"LONGCHAR",
lcds,
true /* formatted */,
"UTF-8" /* encoding */).
fix-codepage(lcds2) = "1252".
copy-lob lcds to lcds2.
This gives me the following error:
Large object assign or copy failed. (11395)
It is a vague error message, it doesn't explain exactly what the problem is, but it might flag that there is further investigation warranted. I believe it will be faster than a char-by-char comparison written in ABL. Depending on the size of your dataset, the memory consumption of the LONGCHARs could be significant.
Hi Mike,
I am guessing you can compare the character length vs byte length of the string that you read in your charater client using ABL LENGTH function and if both the length are not equal, that will tell you that there are some characters in the string that are not compatible, else the length will be equal.
Thanks,
Sachin
Hi Sachin,
I am not sure, this will work. Validation needs to be made on the AppServer.
And we need to allow those characters that fit into the TTY client's code page. German Umlaute in UTF-8 are for instance also double byte but are o.k. for iso8859-1 clients.
Hi Mike,
Here's a method we use to make sure the characters can be converted to Windows 1252.
METHOD PUBLIC STATIC LOGICAL is1252( pChar AS CHARACTER ):
DEFINE VARIABLE vResult AS LOGICAL NO-UNDO.
/* No need to check if our session is 1252. This is done so this code will never fail
during our transition to the UTF-8 client */
IF SESSION:CPINTERNAL EQ "1252" THEN RETURN TRUE.
/* The logic here is using the fact that convert-codepage will turn characters
that can't be converted into question marks.
If the number of question marks changes, then something was not
convertible */
IF NUM-ENTRIES(pChar,"?") NE
NUM-ENTRIES (CODEPAGE-CONVERT(pChar,"1252"),"?") THEN RETURN FALSE.
ELSE
RETURN TRUE.
END METHOD.
Hi Thomas,
also a valid approach. Similar to mine - it requires checking string by string.
Will try, which one runs faster.
Nobody knows a solution that works on full records or ProDatasets without iterating all fields/records/tables?
For a dataset or temp-table, maybe try converting to XML or JSON. Some quick test code (run with -cpinternal UTF-8):
define temp-table tt1 no-undo
field f1 as char
index ix1 f1.
define temp-table tt2 no-undo
field f1 as char
field f2 as char
index ix2 f1 f2.
define dataset ds1 for tt1, tt2
data-relation dr1 for tt1, tt2
relation-fields(f1,f1).
DEFINE VARIABLE lcds AS LONGCHAR NO-UNDO.
DO transaction:
create tt1.
assign tt1.f1 = "A".
create tt2.
assign
tt2.f1 = tt1.f1
/* Turkish lowercase dotless i
* U+0131 => UTF-8 hex 0xc4b1 UTF-8 dec 50353 */
tt2.f2 = CHR(50353).
END.
/* write to XML encoded with 1252 */
fix-codepage(lcds) = "1252".
dataset ds1:handle:write-xml(
"LONGCHAR",
lcds,
false /* formatted */,
"1252" /* encoding */).
This gives me an error message:
Invalid encoding for WRITE-XML. (13515)
For a dataset or temp-table, maybe try converting to XML or JSON. Some quick test code (run with -cpinternal UTF-8):
define temp-table tt1 no-undo
field f1 as char
index ix1 f1.
define temp-table tt2 no-undo
field f1 as char
field f2 as char
index ix2 f1 f2.
define dataset ds1 for tt1, tt2
data-relation dr1 for tt1, tt2
relation-fields(f1,f1).
DEFINE VARIABLE lcds AS LONGCHAR NO-UNDO.
DO transaction:
create tt1.
assign tt1.f1 = "A".
create tt2.
assign
tt2.f1 = tt1.f1
/* Turkish lowercase dotless i
* U+0131 => UTF-8 hex 0xc4b1 UTF-8 dec 50353 */
tt2.f2 = CHR(50353).
END.
/* write to XML encoded with 1252 */
fix-codepage(lcds) = "1252".
dataset ds1:handle:write-xml(
"LONGCHAR",
lcds,
false /* formatted */,
"1252" /* encoding */).
This gives me an error message:
Invalid encoding for WRITE-XML. (13515)
Flag this post as spam/abuse.
The code should read "windows-1252" for the WRITE-XML encoding.
But sadly, when the code is correct, there is no error. Instead the character is written escaped:
<?xml version="1.0" encoding="windows-1252"?>
<ds1 xmlns:xsi="www.w3.org/.../XMLSchema-instance">
<tt1>
<f1>A</f1>
</tt1>
<tt2>
<f1>A</f1>
<f2>ı</f2>
</tt2>
</ds1>
Not sure if there is much you can do with that.
My last attempt, I promise. Instead of WRITE-XML() to a LONGCHAR of the target codepage, write to a UTF-8 LONGCHAR, then COPY-LOB it to a LONGCHAR of the target codepage. e.g.
define temp-table tt1 no-undo
field f1 as char
index ix1 f1.
define temp-table tt2 no-undo
field f1 as char
field f2 as char
index ix2 f1 f2.
define dataset ds1 for tt1, tt2
data-relation dr1 for tt1, tt2
relation-fields(f1,f1).
DEFINE VARIABLE lcds AS LONGCHAR NO-UNDO.
DEFINE VARIABLE lcds2 AS LONGCHAR NO-UNDO.
DO transaction:
create tt1.
assign tt1.f1 = "A".
create tt2.
assign
tt2.f1 = tt1.f1
/* Turkish lowercase dotless i
* U+0131 => UTF-8 hex 0xc4b1 UTF-8 dec 50353 */
tt2.f2 = CHR(50353).
END.
fix-codepage(lcds) = "UTF-8".
dataset ds1:handle:write-xml(
"LONGCHAR",
lcds,
true /* formatted */,
"UTF-8" /* encoding */).
fix-codepage(lcds2) = "1252".
copy-lob lcds to lcds2.
This gives me the following error:
Large object assign or copy failed. (11395)
It is a vague error message, it doesn't explain exactly what the problem is, but it might flag that there is further investigation warranted. I believe it will be faster than a char-by-char comparison written in ABL. Depending on the size of your dataset, the memory consumption of the LONGCHARs could be significant.
Hi Garry, thanks for all your tried! :-)
The last one looks lots like the one I used for a single string. Assign to a LONGCHAR fixed to the target CP and see if it errors out. I guess I'll go with that one for the ProDatasets.
Cheers,
Mike