Solved

special character encoding in XML

Posted on 2009-07-06
4
1,227 Views
Last Modified: 2013-11-18
I am using python's xml.dom.minidom to create a service that simulates a production server, and I'm using this service on my local machine to test software, rather than hitting the production server for my testing.

While simulating the responses from the production server, there's one part I haven't been able to accurately reproduce...

The server will respond with an xml element like this:

<Item Name="TSCDATA" Value=":020000040001F9&#xA;:100140001600F8F2EFEEF2F6000D0FF303FF0005D4&#xA;:10015000070B0A2101F6000102050304F9F70A035F&#xA;:10016000F1E7E2DDDDDCDFE30BEE0302070A090B5A&#xA;:100170000211F6FCF2EBE7E5E8EBF2FCFEFB050111&#xA;:10018000F9394C3850203730303020474120203169&#xA;:10019000332041343932322032333036303939313C&#xA;:1001A000373430373137313331373030393230341A&#xA;:1001B000323401802406029C3E02010303001DB973&#xA;:00000001FF" />

notice the "&#xA;" which is used as an EndOfLine character

I can't get my simulator program to create that same sequence of characters.  The problem is not the content, it's the line delimiter character.  I can't make the "&#xA;"

I get something like this:

<Item Name="TSCDATA" Value=":02000004DE011B\&amp;#xA;:08000000081022003800000086\&amp;#xA;:08000800081022D020000000C6\&amp;#xA;:08001000081042E0200000008E\&amp;#xA;:08001800061042D02000000098\&amp;#xA;:08002000062042E02000000070\&amp;#xA;:08002800062041D02000000079\&amp;#xA;:08003000042041C02000000083\&amp;#xA;:08003800041041B0200000009B\&amp;#xA;:00000001FF"/>

I've tried a number of variations.

When I supply this as an EOL: "&#xA;", the minidom encodes it as this: "&amp;#xA;"
When I supply this: "\&#xA;", the minidom encodes it as "\&amp;#xA;"
When I supply this: "&&#xA;", the minidom encodes it as "&amp;&amp;#xA;"
When I supply this: "\n", the minidom does not encode it, and leaves it as a linefeed.

How can I tell the minidom engine to either NOT encode the "&#xA;"
or force it to encode "\n" as "&#xA;" ?

Brian
Withun

0
Comment
Question by:Brian Withun
  • 2
4 Comments
 
LVL 27

Accepted Solution

by:
BigRat earned 500 total points
ID: 24792532
When an XML parser parses a string containing &#xA; it MUST convert it into a line feed character.
When an XML parse parses a string containing a character whose hex value is 0A, it MUST insert a line feed character for it.

So to get &x#A; UNCHANGED in an XML document is impossible, if it is to represent a line feed character.

I suspect the same will happen if you put the data in a CDATA section, since CDATA sections may only contain VALID XML characters.

When an XML document outputs an XML string, say via doc.xml(), then the resulting XML should not have entities (ie: the &#x..; sequence) for line feeds. It is however NOT forbidden to convert EVERY character into an entity, although the resultant string would be rather bulky.

That said, why is it necessary to have an entity for line feed in the output? If it is necessary you'll have to write a bit of script to post-process it and replace the line feeds with the entity sequence.
0
 
LVL 13

Author Comment

by:Brian Withun
ID: 25051423
The reason I need the embedded linefeeds is because I'm writing a simulator for an actual XML server.  If I do not do it that way, my simulation will not accurately reflect the behavior of the server it is intended to simulate.

If the server is creating non-standard XML, that is outside my realm of influence and I, too, must create non-standard XML.

It sounds like you are suggesting that this is not possible.  I find it difficult to believe that I cannot embed this string of characters (for example) in an XML document.  I believed XML to be capable of sending anything.

":020000040001F9&#xA;"

How do I encode the string above without it being mangled into something that it is not?

Is there no way to "escape" these characters?

BW

0
 
LVL 27

Expert Comment

by:BigRat
ID: 25058446
I have spent more time on this problem. You can't user CDATA sections since minidom does not support them. I can't seem to find the dom configuration (it probably doesn't have one) where one can switch off character normalization or set the entities property to true. If you can find that interface try it. I doubt however that that will help.

Strictly speaking any XML processor, and that includes things which just sniff it, MUST handle &#xA; and the newline character in the same way. In fact the ENTIRE contents of an XML element could be encoded in entities, eg: &#x41; for an "A". That is just as acceptable as plain text (however silly it might look).
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Python On Mouseover Save URL/Link 4 63
Edit linux file using python 4 49
How to use pyenv 4 36
what are list of ebay api errors 1 17
I found this questions asking how to do this in many different forums, so I will describe here how to implement a solution using PHP and AJAX. The logical flow for the problem should be: Write an event handler for the first drop down box to get …
Styling your websites can become very complex. Here I'll show how SASS can help you better organize, maintain and reuse your CSS code.
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question