Solved

special character encoding in XML

Posted on 2009-07-06
4
1,208 Views
Last Modified: 2013-11-18
I am using python's xml.dom.minidom to create a service that simulates a production server, and I'm using this service on my local machine to test software, rather than hitting the production server for my testing.

While simulating the responses from the production server, there's one part I haven't been able to accurately reproduce...

The server will respond with an xml element like this:

<Item Name="TSCDATA" Value=":020000040001F9&#xA;:100140001600F8F2EFEEF2F6000D0FF303FF0005D4&#xA;:10015000070B0A2101F6000102050304F9F70A035F&#xA;:10016000F1E7E2DDDDDCDFE30BEE0302070A090B5A&#xA;:100170000211F6FCF2EBE7E5E8EBF2FCFEFB050111&#xA;:10018000F9394C3850203730303020474120203169&#xA;:10019000332041343932322032333036303939313C&#xA;:1001A000373430373137313331373030393230341A&#xA;:1001B000323401802406029C3E02010303001DB973&#xA;:00000001FF" />

notice the "&#xA;" which is used as an EndOfLine character

I can't get my simulator program to create that same sequence of characters.  The problem is not the content, it's the line delimiter character.  I can't make the "&#xA;"

I get something like this:

<Item Name="TSCDATA" Value=":02000004DE011B\&amp;#xA;:08000000081022003800000086\&amp;#xA;:08000800081022D020000000C6\&amp;#xA;:08001000081042E0200000008E\&amp;#xA;:08001800061042D02000000098\&amp;#xA;:08002000062042E02000000070\&amp;#xA;:08002800062041D02000000079\&amp;#xA;:08003000042041C02000000083\&amp;#xA;:08003800041041B0200000009B\&amp;#xA;:00000001FF"/>

I've tried a number of variations.

When I supply this as an EOL: "&#xA;", the minidom encodes it as this: "&amp;#xA;"
When I supply this: "\&#xA;", the minidom encodes it as "\&amp;#xA;"
When I supply this: "&&#xA;", the minidom encodes it as "&amp;&amp;#xA;"
When I supply this: "\n", the minidom does not encode it, and leaves it as a linefeed.

How can I tell the minidom engine to either NOT encode the "&#xA;"
or force it to encode "\n" as "&#xA;" ?

Brian
Withun

0
Comment
Question by:Brian Withun
  • 2
4 Comments
 
LVL 27

Accepted Solution

by:
BigRat earned 500 total points
ID: 24792532
When an XML parser parses a string containing &#xA; it MUST convert it into a line feed character.
When an XML parse parses a string containing a character whose hex value is 0A, it MUST insert a line feed character for it.

So to get &x#A; UNCHANGED in an XML document is impossible, if it is to represent a line feed character.

I suspect the same will happen if you put the data in a CDATA section, since CDATA sections may only contain VALID XML characters.

When an XML document outputs an XML string, say via doc.xml(), then the resulting XML should not have entities (ie: the &#x..; sequence) for line feeds. It is however NOT forbidden to convert EVERY character into an entity, although the resultant string would be rather bulky.

That said, why is it necessary to have an entity for line feed in the output? If it is necessary you'll have to write a bit of script to post-process it and replace the line feeds with the entity sequence.
0
 
LVL 13

Author Comment

by:Brian Withun
ID: 25051423
The reason I need the embedded linefeeds is because I'm writing a simulator for an actual XML server.  If I do not do it that way, my simulation will not accurately reflect the behavior of the server it is intended to simulate.

If the server is creating non-standard XML, that is outside my realm of influence and I, too, must create non-standard XML.

It sounds like you are suggesting that this is not possible.  I find it difficult to believe that I cannot embed this string of characters (for example) in an XML document.  I believed XML to be capable of sending anything.

":020000040001F9&#xA;"

How do I encode the string above without it being mangled into something that it is not?

Is there no way to "escape" these characters?

BW

0
 
LVL 27

Expert Comment

by:BigRat
ID: 25058446
I have spent more time on this problem. You can't user CDATA sections since minidom does not support them. I can't seem to find the dom configuration (it probably doesn't have one) where one can switch off character normalization or set the entities property to true. If you can find that interface try it. I doubt however that that will help.

Strictly speaking any XML processor, and that includes things which just sniff it, MUST handle &#xA; and the newline character in the same way. In fact the ENTIRE contents of an XML element could be encoded in entities, eg: &#x41; for an "A". That is just as acceptable as plain text (however silly it might look).
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL (http://www.experts-exchange.com/articles/201/Handling-Date-and-Time-in-PHP-and-MySQL.html) several years ago, it seemed like now was a good time to updat…
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now