Proper procedure to add an unicode field to a non-unicode da

Posted by Tai Li on 06-Oct-2016 08:47

Hi All,

I have an existing database (codepage iso8859-1) and I need to add an unicode field.

In MS SQL server, we can create a new field using datatype 'nchar' or 'nvarchar' to store unicode value.

I would like to know the proper way to do it on OE10.2B.

PS: In PKB, most that I have read are about converting the database to unicode utf-8 or creating unicode utf-8 database from sketch. Also, if I'm able to add just one single unicode field to my existing database, I presume I will be needing these -cpinternal utf-8 -cpstream utf-8 switches to preventing data corruption?

Edit: Specifically the unicode data that will be inserted will be Russian and Arabic.

Thank you.

Regards,
Tai Li

All Replies

Posted by kirchner on 06-Oct-2016 10:19

Hi Tai,

AFAIK you will need to convert or create a new database using utf-8. There's no such thing as "adding a single Unicode column" to a non-utf8 DB.

You'll probably need to change your -cpinternal as well, but I think you can leave -cpstream be and CONVERT TARGET as needed to minimize code impact on old code.

I'd also consider something like this:

- you add a simple binary/char column to your table without any conversion to your DB. Then you handle Russian/Arabic data in the application (probably with -cpinternal utf-8) and before writing to the database you convert it to a binary or base64 representation. Surely you will need to convert back when reading data. It's extra coding for sure, but if this new field is used only sparingly within the application it may be worth the effort.

Posted by Frank Meulblok on 06-Oct-2016 11:22

You can also use a CLOB field, those can have a different codepage from the rest of the database.

But those can't be indexed or used in a WHERE clause, so that still may not give you what you want.

From personal experience though, it's rare that you'll only need one unicode-enabled column - once people get used to being able to store any text they want in a field, they start wanting that capability for all fields. So converting the database to UTF-8 is probably the better long-term plan anyway.

Posted by Tai Li on 11-Oct-2016 02:20

Thanks all for your input. I'll need to analyse the effort and impact.

PS: I have changed this thread to discussion.

This thread is closed