sybe
asked on
Escaping HTML characters to XML
I have some XHTML code, which I want to include as a node-with-subnodes into an XML file.
example:
xml:
<root>
<record>
<id>20</id>
<name>somename</somename>
<description/>
</record>
</root>
xhtml to be included in "description"
<p>this<b>something bold</b>is to be included</p>
and the result would be:
<root>
<record>
<id>20</id>
<name>somename</somename>
<description>
<p>
this
<b>something bold</b>
is to be included
</p>
</description>
</record>
</root>
I am using ASP with Msxml2.DOMDocument.3.0
- - - -
The inserted XHTML comes from an ActiveX in a webform, which produces XHTML.
Everything goes well, untill there are some HTML encoded characters like ë (é) or €
Then i can not transform the XHTML to a DOM document.
I have been working to escape the HTML encoded characters to XML encoding, replacing "ë" with "ë". Then everything works again. This however takes a simple but long function, in which i have to replace ALL possible HTML-encoded characters with their XML-equivalent.
I wonder if their is an easier way to do it.
Any suggestion is welcome.
example:
xml:
<root>
<record>
<id>20</id>
<name>somename</somename>
<description/>
</record>
</root>
xhtml to be included in "description"
<p>this<b>something bold</b>is to be included</p>
and the result would be:
<root>
<record>
<id>20</id>
<name>somename</somename>
<description>
<p>
this
<b>something bold</b>
is to be included
</p>
</description>
</record>
</root>
I am using ASP with Msxml2.DOMDocument.3.0
- - - -
The inserted XHTML comes from an ActiveX in a webform, which produces XHTML.
Everything goes well, untill there are some HTML encoded characters like ë (é) or €
Then i can not transform the XHTML to a DOM document.
I have been working to escape the HTML encoded characters to XML encoding, replacing "ë" with "ë". Then everything works again. This however takes a simple but long function, in which i have to replace ALL possible HTML-encoded characters with their XML-equivalent.
I wonder if their is an easier way to do it.
Any suggestion is welcome.
ASKER
I tried the CDATA thing, and the parsing gives no problem. However, I am transforming the XML with XSL to a browser, and the CDATA section then is displayed as text, not as HTML.
I looked for some solution and found the disable-output-escaping which works in Internet Explorer, but not in Mozilla browsers.
So the CDATA solution did/does not bring me closer to solving the problem.
I will try to do something with the DTD thing you mention. I have never worked with that, do you have some links on that?
I looked for some solution and found the disable-output-escaping which works in Internet Explorer, but not in Mozilla browsers.
So the CDATA solution did/does not bring me closer to solving the problem.
I will try to do something with the DTD thing you mention. I have never worked with that, do you have some links on that?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
robbert,
i had used TidyCom to create XHTNL, but i did not find the options to convert HTML entities to numerics.
i'll look at it again, but maybe you can tell me ?
i had used TidyCom to create XHTNL, but i did not find the options to convert HTML entities to numerics.
i'll look at it again, but maybe you can tell me ?
I think you can define the entities in a DTD using the syntax <!ENTITY euml "ë">.
I wonder however whether you need to be able parse the XHTML that you are inserting. If not then you could escape the XHTML section using CDATA tags as follows:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<record>
<id>20</id>
<name>somename</name>
<description><![CDATA[
<p>
this
<b>something bold</b>
is to be included ë
</p>
]]></description>
</record>
</root>
>S'Plug<