Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

special character encoding in XML

Posted on 2009-07-06
4
Medium Priority
?
1,289 Views
Last Modified: 2013-11-18
I am using python's xml.dom.minidom to create a service that simulates a production server, and I'm using this service on my local machine to test software, rather than hitting the production server for my testing.

While simulating the responses from the production server, there's one part I haven't been able to accurately reproduce...

The server will respond with an xml element like this:

<Item Name="TSCDATA" Value=":020000040001F9&#xA;:100140001600F8F2EFEEF2F6000D0FF303FF0005D4&#xA;:10015000070B0A2101F6000102050304F9F70A035F&#xA;:10016000F1E7E2DDDDDCDFE30BEE0302070A090B5A&#xA;:100170000211F6FCF2EBE7E5E8EBF2FCFEFB050111&#xA;:10018000F9394C3850203730303020474120203169&#xA;:10019000332041343932322032333036303939313C&#xA;:1001A000373430373137313331373030393230341A&#xA;:1001B000323401802406029C3E02010303001DB973&#xA;:00000001FF" />

notice the "&#xA;" which is used as an EndOfLine character

I can't get my simulator program to create that same sequence of characters.  The problem is not the content, it's the line delimiter character.  I can't make the "&#xA;"

I get something like this:

<Item Name="TSCDATA" Value=":02000004DE011B\&amp;#xA;:08000000081022003800000086\&amp;#xA;:08000800081022D020000000C6\&amp;#xA;:08001000081042E0200000008E\&amp;#xA;:08001800061042D02000000098\&amp;#xA;:08002000062042E02000000070\&amp;#xA;:08002800062041D02000000079\&amp;#xA;:08003000042041C02000000083\&amp;#xA;:08003800041041B0200000009B\&amp;#xA;:00000001FF"/>

I've tried a number of variations.

When I supply this as an EOL: "&#xA;", the minidom encodes it as this: "&amp;#xA;"
When I supply this: "\&#xA;", the minidom encodes it as "\&amp;#xA;"
When I supply this: "&&#xA;", the minidom encodes it as "&amp;&amp;#xA;"
When I supply this: "\n", the minidom does not encode it, and leaves it as a linefeed.

How can I tell the minidom engine to either NOT encode the "&#xA;"
or force it to encode "\n" as "&#xA;" ?

Brian
Withun

0
Comment
Question by:Brian Withun
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 27

Accepted Solution

by:
BigRat earned 2000 total points
ID: 24792532
When an XML parser parses a string containing &#xA; it MUST convert it into a line feed character.
When an XML parse parses a string containing a character whose hex value is 0A, it MUST insert a line feed character for it.

So to get &x#A; UNCHANGED in an XML document is impossible, if it is to represent a line feed character.

I suspect the same will happen if you put the data in a CDATA section, since CDATA sections may only contain VALID XML characters.

When an XML document outputs an XML string, say via doc.xml(), then the resulting XML should not have entities (ie: the &#x..; sequence) for line feeds. It is however NOT forbidden to convert EVERY character into an entity, although the resultant string would be rather bulky.

That said, why is it necessary to have an entity for line feed in the output? If it is necessary you'll have to write a bit of script to post-process it and replace the line feeds with the entity sequence.
0
 
LVL 13

Author Comment

by:Brian Withun
ID: 25051423
The reason I need the embedded linefeeds is because I'm writing a simulator for an actual XML server.  If I do not do it that way, my simulation will not accurately reflect the behavior of the server it is intended to simulate.

If the server is creating non-standard XML, that is outside my realm of influence and I, too, must create non-standard XML.

It sounds like you are suggesting that this is not possible.  I find it difficult to believe that I cannot embed this string of characters (for example) in an XML document.  I believed XML to be capable of sending anything.

":020000040001F9&#xA;"

How do I encode the string above without it being mangled into something that it is not?

Is there no way to "escape" these characters?

BW

0
 
LVL 27

Expert Comment

by:BigRat
ID: 25058446
I have spent more time on this problem. You can't user CDATA sections since minidom does not support them. I can't seem to find the dom configuration (it probably doesn't have one) where one can switch off character normalization or set the entities property to true. If you can find that interface try it. I doubt however that that will help.

Strictly speaking any XML processor, and that includes things which just sniff it, MUST handle &#xA; and the newline character in the same way. In fact the ENTIRE contents of an XML element could be encoded in entities, eg: &#x41; for an "A". That is just as acceptable as plain text (however silly it might look).
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Styling your websites can become very complex. Here I'll show how SASS can help you better organize, maintain and reuse your CSS code.
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question