Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Escaping HTML characters to XML

Posted on 2003-10-29
5
Medium Priority
?
586 Views
Last Modified: 2013-11-19
I have some XHTML code, which I want to include as a node-with-subnodes into an XML file.
example:

xml:

<root>
    <record>
        <id>20</id>
        <name>somename</somename>
        <description/>
    </record>
</root>


xhtml to be included in "description"

<p>this<b>something bold</b>is to be included</p>



and the result would be:

<root>
    <record>
        <id>20</id>
        <name>somename</somename>
        <description>
           <p>
               this
                  <b>something bold</b>
               is to be included
           </p>
        </description>
    </record>
</root>

I am using ASP with Msxml2.DOMDocument.3.0

- - - -

The inserted XHTML comes from an ActiveX in a webform, which produces XHTML.
Everything goes well, untill there are some HTML encoded characters like &euml; (é) or &euro;
Then i can not transform the XHTML to a DOM document.

I have been working to escape the HTML encoded characters to XML encoding, replacing "&euml;"  with "&#235;". Then everything works again. This however takes a simple but long function, in which i have to replace ALL possible HTML-encoded characters with their XML-equivalent.
I wonder if their is an easier way to do it.

Any suggestion is welcome.
0
Comment
Question by:sybe
5 Comments
 
LVL 9

Expert Comment

by:sparkplug
ID: 9641486
Hi,

I think you can define the entities in a DTD using the syntax <!ENTITY euml "&#235;">.

I wonder however whether you need to be able parse the XHTML that you are inserting. If not then you could escape the XHTML section using CDATA tags as follows:

<?xml version="1.0" encoding="UTF-8"?>

<root>
    <record>
        <id>20</id>
        <name>somename</name>
        <description><![CDATA[
           <p>
               this
                  <b>something  bold</b>
               is to be included &euml;
           </p>
        ]]></description>
    </record>
</root>


>S'Plug<
0
 
LVL 28

Author Comment

by:sybe
ID: 9641713
I tried the CDATA thing, and the parsing gives no problem. However, I am transforming the XML with XSL to a browser, and the CDATA section then is displayed as text, not as HTML.
I looked for some solution and found the disable-output-escaping which works in Internet Explorer, but not in Mozilla browsers.
So the CDATA solution did/does not bring me closer to solving the problem.

I will try to do something with the DTD thing you mention. I have never worked with that, do you have some links on that?
0
 
LVL 26

Accepted Solution

by:
rdcpro earned 600 total points
ID: 9642479
There was a thread on that recently.  I posted some links to a standard entity catalog DTD, but here's one:
Latin 1 entities:
http://www.utoronto.ca/webdocs/HTMLdocs/HTML_Spec/xhtml1.0/xhtml-lat1.ent
Special entities:
http://www.utoronto.ca/webdocs/HTMLdocs/HTML_Spec/xhtml1.0/xhtml-special.ent
Symbols:
http://www.utoronto.ca/webdocs/HTMLdocs/HTML_Spec/xhtml1.0/xhtml-symbol.ent

I thought there was a definitive ISO or W3C DTD that you could include in your XML that defined all the entities (an entity catalog), but I can't seem to find it at the moment.

Regards,
Mike Sharp
0
 
LVL 15

Assisted Solution

by:robbert
robbert earned 600 total points
ID: 9658979
You can use TidyCOM ( http://perso.wanadoo.fr/ablavier/TidyCOM/ ) to clean up the source before loading it to a DOMDocument.

There are options for outputting XML (instead of XHTML) and converting HTML entities to their numeric equivalents.

I'm not aware of any concurrant products to TidyCOM, resp., HTMLTidy, and have been working with it, often, and even in mid-scaled web applications. - As HTMLTidy (the actual, wrapped application) is single-threaded, it should only be called in one instance at a time, so look forward to restart IIS every few months or so. - But, as mentioned, there doesn't seem to be an alternative.
0
 
LVL 28

Author Comment

by:sybe
ID: 9662346
robbert,

i had used TidyCom to create XHTNL, but i did not find the options to convert HTML entities to numerics.
i'll look at it again, but maybe you can tell me ?
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Shoutout to Emily Plummer (http://www.experts-exchange.com/members/eplummer26.html) for giving me this article! She did most of it, I just finished it up and posted it for her :)    Introduction In a previous article (http://www.experts-exchang…
Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL several years ago, it seemed like now was a good time to update it for object-oriented PHP.  This article does that, replacing as much as possible the pr…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…
Suggested Courses

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question