say a CLOB field holds a value like "<!CDATA[ bladibla WITH SOME XML... ]]>"
Normally, a WRITE-XML should not escape any '<' or '&', but un our case it does.
On the other hand, with a READ-XML(someDoc) with a CDATA element, we also end up with escaped xml ("< " for "<" etc...) in our CLOB or CHAR fields.
Is that a know issue. Is there a way to work it around?
see CDATA section in http://www.w3schools.com/xml/xml_cdata.asp:
Sorry, it is actually even worse: a ds:READ-XML() fails with error 7351 (see below) if an XML ELEMENT is defined with a CDATA section.
So it tries to analyze the XML content of our CDATA to handle some child temp-table although we do not want it. We just want that CDATA XML element to end up unparsed in a CHAR or CLOB field.
I can hardly imagine how to work it around.
BUFFER-FIELD <field-name> was not found in buffer <buffer-name>. (7351)
You gave a character expression as the argument to the BUFFER-FIELD method of a buffer object, but the character string did not identify any field in that buffer. Check your PROGRESS dictionary for the field names for the table, and do not abbreviate the field name.
Hi all,
First of all I would suggest to go a bit through OE documentation for XML:
documentation.progress.com/.../dvxml.pdf
Secondly i was thinking what is the reason of using read-xml, because that is to be used in context of datasets, temp-tables.
I would suggest to go with SAX-READER option in this case. Would this be a feaseable solution for you?
All the best,
Gabriel
Hi Gabriel, we have searched thoroughly in the dvxml.pdf before starting this thread. There is nothing about CDATA for the DataSet/XML methods., unlike the DOM and SAX cases that can handle it (DOM node SUBTYPE=CDATA-SECTION or SAX WRITE-CDATA). It's like we are missing a kind of XML-NODE-SUBTYPE option for the definition of a temp-table field.
We did not want to use a SAX-READER for convenience. I do not want to give all the details of our use case, but I would be surprised if we were the first ones with the need to handle a bit of unparsable XML code in a TT field. I may raise a TS case.
Sounds like a bug actually. At first sight, we should not even need a new XML-NODE-SUBTYPE option for the Dataset-XML methods. If a field hold some CDATA (="<!CDATA[ whatever]]") then a ds:WRITE-XML() should just write the value as-is (with the CDATA section) without escaping anything. On the other hand, ds:READ-XML() should not parse anything in a CDATA section and just assign a char/clob field to the xml node value. This is just XML standard indeed, no need to justify any use case.
I see you point and I just did an extra research on knowledge base and found the following:
The XML parser cannot read files where a CDATA node has embedded carriage returns and line feeds because these characters are not valid within an XML document. Consequently, the XML parser fails to validate a document formatted in this way when the LOAD method is invoked.
The proper way to send such data is to encode the data using base64 encoding before inserting it into the CDATA node. Doing base64 encoding ensures that there are no characters which cannot be used in an XML document. Once the base64 data has been extracted the BASE64-DECODE function can be used to return the data back into its original format.
See for more:
knowledgebase.progress.com/.../P114125
Would this be an option for you?
Regards,
Gabriel
Hi Gabriel, many thanks for your prompt reply about this one. Interesting to see it might be due to some carriage return that we wanted to support in that CDATA (actually some tiny embedded XML document that is not in the business of the DataSet we are importing) . I fear the base64 will not be of great help because we want that CDATA section to be readable in the XML file it comes from.
Actually, I am pretty perplex with this statement from the KBase:
"The XML parser cannot read files where a CDATA node has embedded carriage returns and line feeds because these characters are not valid within an XML document."
The all point of a CDATA section is to instruct the XML parser to ignore its content (so between '<!CDATA[' and ']]') In other words the parser should just take the CHAR value as-is not try to validate or parse such a section. I fear it still sounds like a bug. I is annoying for people would legitimately need to handle a little piece of independent XML code in some temp-table field (i.e. a little nested piece of XML code in a larger XML document).