How to Read UNICODE Characters in Progress 4GL - OpenEdge Development - Forum

Posted by Aman Sharma on 04-Jul-2016 10:37

I am working on an application in which I am reading a string of character with Mandarin/Chinese characters and I am not able to find the UNICODE or ASCII value of the mandarin characters.

I have tried using the ASC method for example:

ASC(国王豪华房) But it yields with -1 value.

1) Is there any method available in progress by which I can find the UNICODE values of the mandarin characters because I am thinking of fetching the values of mandarin characters and compare it with the chines character set.

2) what is UNICODE range of mandarin/chinese characters

3) Is there any method available in progress by which I can find the unicode values of a mandarin character passed.

OpenEdge Development - Forum

All Replies

Posted by Riverside Software on 04-Jul-2016 15:32

ASC function doesn't work for full strings, only character by character. You have to use something like:

DO k = 1 TO LENGTH('国王豪华房', 'CHARACTER'):

MESSAGE ASC(SUBSTRING('国王豪华房', k, 1, 'CHARACTER').

END.

This function will give you the integer representation of the codepoint in the internal codepage (-cpinternal), so a different value in UTF-8, UCS-2 and Big-5 for example. Using UTF-8, that will give :

15047613

15175307

15249834

15043982

15108287

So the second character is code point U+738B, which has different representations : www.fileformat.info/.../charset_support.htm

2) and 3)

There are lots of Unicode code points for CJK ; see en.wikipedia.org/.../Plane_(Unicode) and search for CJK.

But if you're looking for a way to verify that a text is written in Chinese, I'm not sure that verifying every single character is a good solution

This thread is closed