Capturing and storing Chinese into progress DB - OpenEdge General - Forum

All Replies

Posted by Thomas Mercer-Hursh on 25-Jan-2019 15:28

What *is* the current DB code page? Depending on what this is, what you ask could be impossible.

Posted by Torben on 28-Jan-2019 09:34

We converted everything from iso8859-x/125x codepages many years ago, so everything is utf-8. (Database, AppServer, Client, Source code, input files, output files ....)

This works with Chinese. (And any other of the code-pages we have tested, except Turkish)

Both source code, database and input/output files contains Arabic, Chinese, Japanese, Cyrillic, Western European and more characters)

To get characters to display correctly you need to install/use font, that supports all the characters you want to display. (Otherwise you will not see the character or a box in place of the real character).

Posted by Aidan Jeffery on 29-Jan-2019 16:02

I think we need more information about the database and client configurations in order to help.

What is the code page of the database, and which code pages (-cpinternal) are the clients using?

I assume the clients are running on Windows. Is that correct?

Please note that the kbase article you referred to was written for version 9, when our unicode support was limited. In version OE 11.7 it should not be necessary to convert the characters within the session - it will happen automatically for a CP936 client connected to UTF-8 database, for example.

Posted by Elsworth Burmeister on 05-Feb-2019 14:18

Hi Guys

Sorry the codepage is iso8859-1

Posted by frank.meulblok on 05-Feb-2019 14:39

[quote user="Elsworth Burmeister"]

the codepage is iso8859-1

[/quote]

In that case anything you do will only "work" by accident because the ISO8859-1 codepage does not contain any Chinese characters,

Any end result that looks correct will be because of 2 improper/missing codepage conversions cancelling eachother out. (One on write to the database, one on read from the database). This can (and probably will) also affect any indexes on the character fields in question, interfering with sort order and leading to weird and unwanted side effects. Especially because AFAIK most codepages that do support Chinese are double-byte, and ISO8859-1 is single-byte. (The UTF* Unicode encodings being the exception).

You said you're restricted in changing DB codepage, but you realyl should get that restriction lifted/reconsidered.

Posted by gus bjorklund on 05-Feb-2019 14:42

> On Feb 5, 2019, at 9:20 AM, Elsworth Burmeister wrote:

> Sorry the codepage is iso8859-1

ISO-8859-1 is an 8 bit (single byte) code page designed for western europe and the US. It does not work for chinese characters without some sort of custom hack and much programming to encode/decode the hack.

You should use a multi-byte code page that can handle chinese characters. Two that come to mind are UTF-8 and GB2312. Others are listed here:

documentation.progress.com/.../openedge-support-for-multi-byte-code-pages.html

doc about using multi-byte character sets is here:

documentation.progress.com/.../using-multi-byte-code-pages.html

Posted by Torben on 06-Feb-2019 13:03

In older versions I think the Progress execuptables used single byte code page internally.

This changes with 10.? where the internal code page is multi-byte. ((NOT CPINTERNAL! and think it uses utf-8)

So when using 8 bit character code page conversion happens from screen to internal and back from internal to screen. (Fx € symbol)

If the characters are not valid in the conversion tables, then the iso8859-1 conversion between screen and internal (utf-8) will fail.

Running

MESSAGE

CODEPAGE-CONVERT(CODEPAGE-CONVERT("€", "iso8859-1", "utf-8"), "utf-8", "iso8859-1") SKIP

CODEPAGE-CONVERT(CODEPAGE-CONVERT("€", "iso8859-15", "utf-8"), "utf-8", "iso8859-15") SKIP

ASC("€") SKIP

CODEPAGE-CONVERT(CODEPAGE-CONVERT("出", "iso8859-1", "utf-8"), "utf-8", "iso8859-1") SKIP

CODEPAGE-CONVERT(CODEPAGE-CONVERT("出", "iso8859-15", "utf-8"), "utf-8", "iso8859-15") SKIP

ASC("出").

with -cpinternal utf-8 following gives:

€

14844588

15042490

Posted by Matt Gilarde on 06-Feb-2019 15:27

[quote user="Torben"]

In older versions I think the Progress execuptables used single byte code page internally.

This changes with 10.? where the internal code page is multi-byte. ((NOT CPINTERNAL! and think it uses utf-8)

So when using 8 bit character code page conversion happens from screen to internal and back from internal to screen. (Fx € symbol)

If the characters are not valid in the conversion tables, then the iso8859-1 conversion between screen and internal (utf-8) will fail.

[/quote]It happened in 10.0A in the Windows GUI client (prowin/prowin32) when it became a Unicode-compliant application. Data which is displayed on screen is converted to/from UTF-16 because that's what Windows uses natively for displaying Unicode characters.

This thread is closed