How do you handle UTF8 encoded string in ABL?

Posted by jquerijero on 10-Oct-2011 13:19

-cpinternal is 1252 Basic which I believe is a superset of UTF8.

I'm calling a .NET method that returns a long UTF8 encoded string and storing it to a LONGCHAR. I'm getting a "cannot convert CLOB/LONGCHAR" error when using COPY-LOB CROM varLongChar TO FILE "c:\stuff.xml". Adding NO-CONVERT works.

Any ideas?

All Replies

Posted by maximmonin on 11-Oct-2011 03:24

Add:

convert source "utf-8"

when read from file "stuff.txt"

Posted by jquerijero on 11-Oct-2011 09:12

I'm trying to write to stuff.txt, and I believe LONGCHAR always use the -cpinternal setting.

Posted by gus on 11-Oct-2011 09:25

I think Windows-1252 is a SUBSET of utf-8. It is a legacy Microsoft character set for Latin alphabets and uses 8 bit characters. It is similar to ISO-8859-15 but many of the characters have different code points.

When you define a LONGCHAR variable, by default, it uses the same character encoding as that specified for -cpinternal. You can change that by using the FIX-CODEPAGE statement before assigning a value to the variable.

Posted by jquerijero on 11-Oct-2011 15:19

I added FIX-CODEPAGE(cLongChar) = "UTF-8". I'm still getting a "Error converting CLOB/LONGCHAR" when using COPY-LOB from a longchar to file. I don't believe you can specify what codepage to use in COPY-LOB statement. I think that's where the problem lies.

Posted by gus on 11-Oct-2011 15:36

The output file has a code page too. The default is specified by the value of -cpstream. However, you can specify both the source code page and the target code page in the copy-lob statement or that no conversion is to be performed.

copy-lob from lcvar to file "foo.dat" convert source codepage utf-8 target codepage utf-8.

or

copy-lob from lcvar to file "foo.dat" convert target codepage utf-8.

or

copy-lob from lcvar to file "foo.dat" no-convert.

Posted by jquerijero on 11-Oct-2011 16:28

copy-lob from lcvar to file "foo.dat" convert source codepage utf-8 target codepage utf-8.

>>> This will not work. You will get a run-time error saying that you can not specify source codepage for LONGCHAR.

copy-lob from lcvar to file "foo.dat" convert target codepage utf-8.

>>> This doesn't work.

copy-lob from lcvar to file "foo.dat" no-convert.

>>> This is what I'm using now. I still need a good codepage conversion though if I ever needed to inspect the longchar first.

Some interesting stuff I noticed;

cLongChar = SomeClass:SomeNETMethod(). /* returns a UTF-8 encoded string */ --- results to cLongChar codepage of UTF-16 not UTF-8 not even the -cpinternal Basic-1252. I'm not really sure where it is getting that codepage.

So in a nutshell;

- the returned value SomeClass:SomeNETMethod() is being interpreted as UTF-16 instead of UTF-8

- a UTF-16 is stored in the longchar variable

- that UTF-16 longchar is being interpreted as Basic-1252 during COPY-LOB

Major confusion.

This thread is closed