UTF-8 Encoding Doubt

Posted by anzeljamal on 21-Jul-2016 07:01

I have some doubts regarding the UTF-8 encoding. I am looking for a clarification on couple of scenarios in which i am trying. Please find the scenarios below:

1. My database is UTF-8 Encoded and the collation used is BASIC. The GUI client which i am using has the cpinternal and cpstream parameter set to iso8859-1 and the cpcoll parameter is set to a collation name specific to the language of the country (not BASIC). So if i need to make the session also UTF-8 encoded can i change the cpinternal and cpsteam parameter to UTF-8 ?

2. My database is iso8859-1 encoded and collation name used is language specific(not BASIC) and the GUI client also uses the same cpinternal and collation name as of database. If i need to make it UTF-8 encoded, can i convert the database to utf-8 encoded and change the cpinternal, cpstream and cpcoll parameters respectively to UTF-8 values ?

3. I am only getting a mapping of collation name BASIC for UTF-8 encoding. Can i create a new collation table entry with the language i needed ? Or whether the collation table entry for that specific language is created while installing OpenEdge ?

Any help on this is appreciated.

All Replies

Posted by Aidan Jeffery on 21-Jul-2016 11:30

The collations are tied to code pages. For UTF-8, a set of ICU collations is available in addition to Basic. The definitions for these are provided under DLC/prolang/utf, with names like ICU-UCA.df.

1. You can change the clients to use -cpinternal UTF-8 -cpstream UTF-8. For -cpcoll, use either ICU-UCA, or the ICU collation that meets the particular sorting requirements of your locale.

2. You can convert a database from code page iso8859-1 to UTF-8.

a. Use proutil convchar convert for the code page conversion.

b. Load the UTF-8 word-break rules using proutil with word-rules proword.254.

c. Use Data Administration to import the ICU collation

d. You have to rebuild all the indexes after this - proutil idxbuild

3. Use one of the ICU collations as mentioned above.

Hope this helps.

Posted by Garry Hall on 22-Jul-2016 11:44

Further to Aidan's response: It is not possible to define your own collation for a UTF-8 database. You have BASIC and the ICU collations. Since OE 11.1, you have a larger choice of locale-specific collations from which to choose, which would hopefully meet your needs. Definitely test before moving to UTF-8, especially queries with non-ASCII data.

A word of caution with changing -cpstream: this defines the default codepage used by the client when importing and exporting data, unless it is overridden in the code. If you are importing files with non-ASCII characters, then you will need to be careful to change the code to import from the expected encoding. If this is your development environment you are changing, -cpstream is also used for your source code, so if you have non-ASCII characters in your source, they will be treated incorrectly. A best practice is to avoid non-ASCII characters in source code.

This thread is closed