Avatar of loucker
loucker
 asked on

[DOM] How to escape special character

Hi guys,

I have an exception when I try to parse a xml with Microsoft.XMLDOM, this XML contains the special character & (é).

I would like to know how to escape this character. I can't used CDATA.

Thanks in advance.
Windows 7XMLJavaScript

Avatar of undefined
Last Comment
Gertone (Geert Bormans)

8/22/2022 - Mon
ASKER CERTIFIED SOLUTION
leakim971

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
SOLUTION
Gertone (Geert Bormans)

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Gertone (Geert Bormans)

But I need to give you some more understanding

é is an XML general entity.
If you use an entity in XML it needs to be declared in front of your document
(see the below example)
Since Unicode was not generally supported in SGML, non ANSI characters were added using entity declarations in the DTD similar to belows example
HTML inherited this approach. The only difference is that you don't need the entity declarations in a browser, because they are built in.
That is why you would find many HTML files without entity declarations, still being valid.
In XML you can only have &, <, &gt, " and ' without declaration (built in)
The others need a declaration.
But since XML supports full unicode, you can always add an entity numerically as in my example above,
and there is no need to add the complexity of general entities in your XML, at least not for characters
In short, the XML you are trying to parse is not valid, or at least uncomplete

If there is only a few entities that you need to replace
you can replace them with their numeric equivalent
as I showed you in my previous comment
If you have plenty of them, just add the full set of entities as in XHTML1
http://www.w3.org/TR/xhtml1/#h-A2
to your XML

So you have 2 options
1 replace each character entity with its numerical equivalent
2 add entity declarations to your XML as in below example (root needs to be the root element of your XML)

I would avoid leakims solution since it alters the document and forces you into postprocessing
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ENTITY eacute "&#233;">
]>
<root>&eacute;</root>

Open in new window

Your help has saved me hundreds of hours of internet surfing.
fblack61