[DOM] How to escape special character

loucker
loucker used Ask the Experts™
on
Hi guys,

I have an exception when I try to parse a xml with Microsoft.XMLDOM, this XML contains the special character & (é).

I would like to know how to escape this character. I can't used CDATA.

Thanks in advance.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Multitechnician
Top Expert 2014
Commented:
é

is

é

so you don't really escape but encode it "one more time"
Gertone (Geert Bormans)Information Architect
Top Expert 2006
Commented:
é would make the XML parse wellformed, but drops the meaning of the éacute

Actually it depends on your encoding, but when it is UTF-8 you can simply replace it with 'é' or with 'é'
If I would have to escape it in a save way I would use é because that is a correct replacement for é regardless of the encoding
Gertone (Geert Bormans)Information Architect
Top Expert 2006

Commented:
But I need to give you some more understanding

é is an XML general entity.
If you use an entity in XML it needs to be declared in front of your document
(see the below example)
Since Unicode was not generally supported in SGML, non ANSI characters were added using entity declarations in the DTD similar to belows example
HTML inherited this approach. The only difference is that you don't need the entity declarations in a browser, because they are built in.
That is why you would find many HTML files without entity declarations, still being valid.
In XML you can only have &, <, &gt, " and ' without declaration (built in)
The others need a declaration.
But since XML supports full unicode, you can always add an entity numerically as in my example above,
and there is no need to add the complexity of general entities in your XML, at least not for characters
In short, the XML you are trying to parse is not valid, or at least uncomplete

If there is only a few entities that you need to replace
you can replace them with their numeric equivalent
as I showed you in my previous comment
If you have plenty of them, just add the full set of entities as in XHTML1
http://www.w3.org/TR/xhtml1/#h-A2
to your XML

So you have 2 options
1 replace each character entity with its numerical equivalent
2 add entity declarations to your XML as in below example (root needs to be the root element of your XML)

I would avoid leakims solution since it alters the document and forces you into postprocessing
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ENTITY eacute "&#233;">
]>
<root>&eacute;</root>

Open in new window

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial