How to escape those special characters in XML file??

Posted on 2007-10-03
Last Modified: 2009-04-12
I am converting email files using PERL to xml file.
I escaped those special characters like <|>|&|"|' but I see there are other characters like foreign langagues.

How can I take care of this?
I tried  inserting them into the CDATA but it still crapped out.

What should I do??
Question by:dkim18
    LVL 10

    Assisted Solution

    Hello dkim18,

    XML is UTF-8, so foreign language characters should not be any problem.

    There are five characters that are markup delimiters in XML, and therefore can never appear in their literal form in XML character data (such as the text value of an element). If these characters are needed as literals, the following named entities MUST be used:

        * &amp; for & (ampersand)
        * &lt; for < (left angle bracket, less-than sign)
        * &gt; for > (right angle bracket, greater-than sign)
        * &quot; for " (quotation mark)
        * &apos; for ' (apostrophe)


    LVL 60

    Accepted Solution

    > XML is UTF-8

    this is not true
    XML can be UTF-8, but doesn't necessarily have to be.
    CDATA sections make that you don't have to escape the famous five,
    but they don't make illegal characters legal.

    You can set the encoding that was used in the XML through the xml declaration
    example <?xml version="1.0" encoding="ISO-8859-1" ?>
    indicates that the encoding used is ISO-8859-1 (iso latin 1)
    If the declaration is left out, the default is UTF-8 and UTF-8 is most commonly used
    You really have to find out what encoding your perl script is generating
    and mention that in the declaration
    Why don't you test by adding this in the front of your XML
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    it might already work

    If you don't find the correct encoding
    (this could be the case if you are merging data from texts, databases etc, and you are dealing with a mixed encoding)
    you could add filters that map certain characters to the unicode number, like this &#233; for "é"
    This number is correct, regardless of the encoding

    By the way
    XML is Unicode, the encoding is just the binary representation, UTF-8 is simply such a repreentation, but there are many
    I think that is what archang3l meant to say




    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    The Problem How to write an Xquery that works like a SQL outer join, providing placeholders for absent data on the outer side?  I give a bit more background at the end. The situation expressed as relational data Let’s work through this.  I’ve …
    The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
    Need more eyes on your posted question? Go ahead and follow the quick steps in this video to learn how to Request Attention to your question. *Log into your Experts Exchange account *Find the question you want to Request Attention for *Go to the e…
    Hi everyone! This is Experts Exchange customer support.  This quick video will show you how to change your primary email address.  If you have any questions, then please Write a Comment below!

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now