Invalid Unicode characters in XML 1.0 document - OpenEdge Development - Forum

Posted by martinz on 26-Sep-2019 07:26

Hi,

In our ERP system we export some data to xml-files. We do this by buffer-copying rows from tables to temp-tables, and then exporting these using WRITE-XML. This results in a file starting with <?xml version="1.0" encoding="UTF-8"?>. So it's an xml 1.0 file.

Some of our customers have data in the tables with unicode characters, for instance U+1A. In our application, that's perfectly valid data. However, it is an invalid character in xml 1.0. It's valid in xml 1.1, but only when it's written as 

So we have two issues here:

1. As far as I can tell, WRITE-XML always writes xml 1.0 files. Is it possible to write xml 1.1 files? If so, how? I can't find anything in the docs. We could just replace the header of the resulting document, but that's a hack.

2. We need to rewrite almost all invalid characters from U+01 until U+1F to the &#x__; form. Is there a smart way to do this? As we are always interested in complete records, buffer-copy is the most efficient way for us to copy the records to the temp-table. When we need to sanitize the data, we could do a replace for all these characters on all character fields, but that's a performance issue waiting to happen.

Any thoughts? Experiences?

TIA

Martin Zantingh

OpenEdge Development - Forum

All Replies

Posted by Stefan Drissen on 26-Sep-2019 08:16

1. no - the number of xml 1.1 consumers is also very disappointing

2a. when exporting end user xml we strip out all invalid asc values - it is not smart

2b. for all other use we try to use write-json which handles it all a lot better, but does miss a 'write-schema' option, but you can combine write-xmlschema with write-json

This thread is closed