Convert UTF-8 file to ISO8859-1

Posted by cpsltd on 01-Jul-2009 09:25

Hello,

I would like to convert a large UTF-8 encoding file to a ISO8859-1 encoding file. I know I will be lost some characters but I don't care.

I used INPUT FROM VALUE(filename) CONVERT SOURCE "utf-8" TARGET "iso8859-2" but the program stops to read the file when it found the first utf-8 character.

I would like to use 'CODEPAGE-CONVERT' to convert each line of the file.

I think I have to set CPINTERNAL, CPSTREAM, etc... but it never works.

Thanks for your help

Chris.

All Replies

Posted by GregHiggins on 01-Jul-2009 10:22

Version?

V10: copy-lob from file file1.txt to file file2.txt convert source codepage "utf-8" target codepage "iso8859-1".

Note: You said iso8859-1 but your code has iso8859-2

Posted by cpsltd on 02-Jul-2009 06:43

Thanks for the answer.

In fact I have to do this in V9 and V10 under Windows.

I tried to do what you said, but it doesn't work. I have exactly the same file before and after the 'COPY-LOB'.

I have a file about 900Mb. I open it in UltraEdit (in UTF-8) and I save it in ANSI/ASCII. Then, after I have a new file about 450Mb.

I lost all characters on 2bytes, instead I have '?'. That's what I want.

Maybe I can do that in Progress or I have to modify the convmap.cp. I don't know.

Chris.

Note: it is ISO8859-1

Posted by rstanciu on 03-Jul-2009 03:56

on any "linux" distribution then command is:

iconv --from-code=UTF-8 --to-code=ISO-8859-1 -c -s ./oldfile.p > ./newfile.p

and, of course, you loose all invalides characteres in output.

Posted by Stefan Drissen on 04-Jul-2009 07:36

I tried the COPY-LOB example and it is indeed converting nothing at all. Changing the target and source code pages to BERT and ERNIE also does not produce an error. It would seem that the CONVERT is simply, although documented, ignored - which smells like a bug.

Posted by rstanciu on 05-Jul-2009 05:40

OpenEdge 10.2A01 Linux:

It works very well ! just start your Progress session with:
-cpcase basic
-cpcoll basic
-cpstream utf-8
-cpinternal utf-8

==================================================

/* iconv_UTF-8_ISO8859-1.p */
DEFINE STREAM lsIN.
DEFINE STREAM lsOUT.
DEFINE VARIABLE lc_filename   AS CHARACTER NO-UNDO.
DEFINE VARIABLE jLigne        AS CHARACTER NO-UNDO.
lc_filename = "site/index.p".

  INPUT STREAM lsIN FROM VALUE(lc_filename)
         CONVERT TARGET "iso8859-1" SOURCE "utf-8".
  OUTPUT STREAM lsOUT TO VALUE(lc_filename + ".new").
  REPEAT:
    IMPORT STREAM lsIN UNFORMATTED jLigne.
    PUT STREAM lsOUT UNFORMATTED jLigne SKIP.
  END.
  INPUT  STREAM lsIN  CLOSE.
  OUTPUT STREAM lsOUT CLOSE.

==================================================

* invalid characters will replaced with ??????

This thread is closed