Hi All,
We are facing issue while reading xml file having supplier name in chinese, program reading chinese characters as ."...".
Xml file we are loading having Encoding - UTF-8
Session we are loading have code page - “ISO8859-1”
Loading xml using load method
- tried by passing xml file directly to load method
hXML:LOAD(cDocType, cFileName, NO) NO-ERROR.
- tried reading the content of file in memptr type variable but no luck
INPUT FROM value (cFileName) BINARY NO-MAP CONVERT SOURCE "UTF-8".
IMPORT unformatted mPointer.
INPUT CLOSE.
hXML:LOAD("memptr",mPointer,FALSE) NO-ERROR.
- While reading value from node, tried converting using code-page-convert function but no luck.
can anyone guide me what else we can try on this?
regards,
Sachin
Hi gaursaab,
ISO8859-1 is a single-byte character set. en.wikipedia.org/.../IEC_8859-1
UTF-8 is a variable-byte character set en.wikipedia.org/.../UTF-8
There is no way to represent Chinese multi-byte characters in a single-byte code page.
You need to change the internal code page for the session to use a multi-byte code page in order to work with the characters in that file properly.
Also, be careful not to accidentally store any of those strings in a database that isn't configured with a code page capable of handling them.
In an international world, you cannot get away with using ISO 8859-1. It is too limited. Your default selection should be UTF-8.
For fuller explanation of why...read on:
www.w3.org/.../qa-choosing-encodings
The full manifesto:
Hi Mat,
Hi Peter,
The default character encoding used to encode the contents of an XML document for X-document object handle is "UTF-8". So, the LOAD() method should read the multibyte characters correctly. Could you share the XML file that you are loading? And, how are you reading the values from node after loading the XML file?
Like it or not, you need to set the session:CPINTERNAL to UTF-8. Otherwise your session will not be able to handle those Chinese characters correctly.
You'll also need to move your databases to UTF-8 if you plan on storing the data from the XML there.
(You could also use another codepage that supports Chinese such as GB2312, but then you'll run into the same wall over and over if/when you need to support languages that use Cyrillic script, Arabic, other Asian languages, Emoji's, ... Just going for a Unicode encoding means you only have to do this once to cover all of those.)
Changing cpstream won't help. cpstream tells the import to assume incoming bytes are UTF-8. XML has a header that indicates the character encoding for the document, so cpstream isn't in play here. Again, you simply cannot convert UTF-8 bytes coming from that XML document into a form that is usable with ISO 8859-1. ISO 8859-1 has absolutely no way to represent those characters.
in addition, to display or print, you will need one or more typefaces that contain glyphs for the chinese characters.