DATASET:READ-XML() and WRITE-XML() ignores CDATA ??

Posted by slacroixak on 12-Jan-2015 04:50

say a CLOB field holds a value like "<!CDATA[ bladibla WITH SOME XML...  ]]>"

Normally, a WRITE-XML should not escape any '<' or '&', but un our case it does.

On the other hand, with a READ-XML(someDoc) with a CDATA element, we also end up with escaped xml ("&lt; " for "<" etc...) in our CLOB or CHAR fields.

Is that a know issue.  Is there a way to work it around?

see CDATA section in http://www.w3schools.com/xml/xml_cdata.asp:

All Replies

Posted by slacroixak on 12-Jan-2015 05:03

Sorry, it is actually even worse: a ds:READ-XML()  fails with error 7351 (see below) if an XML ELEMENT is defined with a CDATA section.

So it tries to analyze the XML content of our CDATA to handle some child temp-table although we do not want it.  We just want that CDATA XML element to end up unparsed in a CHAR or CLOB field.

I can hardly imagine how to work it around.

BUFFER-FIELD <field-name> was not found in buffer <buffer-name>. (7351)

You gave a character expression as the argument to the BUFFER-FIELD method of a buffer object, but the character string did not identify any field in that buffer.  Check your PROGRESS dictionary for the field names for the table, and do not abbreviate the field name.

Posted by gabriel.lucaciu on 12-Jan-2015 06:28

Hi all,

First of all I would suggest to go a bit through OE documentation for XML:

documentation.progress.com/.../dvxml.pdf

Secondly i was thinking what is the reason of using read-xml, because that is to be used in context of datasets, temp-tables.

I would suggest to go with SAX-READER option in this case. Would this be a feaseable solution for you?

All the best,

Gabriel

Posted by mihai.pintea on 12-Jan-2015 06:43

Hi all,

A short example of using SAX-WRITER can be found on OEHive:

www.oehive.org/.../1449

Posted by slacroixak on 12-Jan-2015 06:50

Hi Gabriel, we have searched thoroughly in the dvxml.pdf before starting this thread.  There is nothing about CDATA for the DataSet/XML methods., unlike the DOM and SAX cases that can handle it (DOM node SUBTYPE=CDATA-SECTION  or SAX WRITE-CDATA).  It's like we are missing a kind of XML-NODE-SUBTYPE option for the definition of a temp-table field.

We did not want to use a SAX-READER for convenience.  I do not want to give all the details of our use case, but I would be surprised if we were the first ones with the need to handle a bit of unparsable XML code in a TT field.  I may raise a TS case.

Posted by slacroixak on 12-Jan-2015 07:10

Sounds like a bug actually.  At first sight, we should not even need a new XML-NODE-SUBTYPE option for the Dataset-XML methods.  If a field hold some CDATA  (="<!CDATA[ whatever]]")   then a ds:WRITE-XML() should just write the value as-is (with the CDATA section) without escaping anything.  On the other hand, ds:READ-XML() should not parse anything in a CDATA section and just assign a char/clob field to the xml node value.  This is just XML standard indeed, no need to justify any use case.

Posted by gabriel.lucaciu on 12-Jan-2015 07:50

I see you point and I just did an extra research on knowledge base and found the following:

The XML parser cannot read files where a CDATA node has embedded carriage returns and line feeds because these characters are not valid within an XML document.  Consequently, the XML parser fails to validate a document formatted in this way when the LOAD method is invoked.

The proper way to send such data is to encode the data using base64 encoding before inserting it into the CDATA node.  Doing base64 encoding ensures that there are no characters which cannot be used in an XML document.  Once the base64 data has been extracted the BASE64-DECODE function can be used to return the data back into its original format.

See for more:

knowledgebase.progress.com/.../P114125

Would this be an option for you?

Regards,

Gabriel

Posted by slacroixak on 12-Jan-2015 08:49

Hi Gabriel, many thanks for your prompt reply about this one.  Interesting to see it might be due to some carriage return that we wanted to support in that CDATA (actually some tiny embedded XML document that is not in the business of the DataSet we are importing) .  I fear the base64 will not be of great help because we want that CDATA section to be readable in the XML file it comes from.

Actually, I am pretty perplex with this statement from the KBase:

"The XML parser cannot read files where a CDATA node has embedded carriage returns and line feeds because these characters are not valid within an XML document."

The all point of a CDATA section is to instruct the XML parser to ignore its content (so between '<!CDATA[' and ']]')  In other words the parser should just take the CHAR value as-is not try to validate or parse such a section.  I fear it still sounds like a bug.  I is annoying for people would legitimately need to handle a little piece of independent XML code in some temp-table field (i.e. a little nested piece of XML code in a larger XML document).

This thread is closed