• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 33309
  • Last Modified:

How to escape those special characters in XML file??

Hi,
I am converting email files using PERL to xml file.
I escaped those special characters like <|>|&|"|' but I see there are other characters like foreign langagues.

How can I take care of this?
I tried  inserting them into the CDATA but it still crapped out.

What should I do??
Thanks.
0
dkim18
Asked:
dkim18
2 Solutions
 
archang3lCommented:
Hello dkim18,

XML is UTF-8, so foreign language characters should not be any problem.

There are five characters that are markup delimiters in XML, and therefore can never appear in their literal form in XML character data (such as the text value of an element). If these characters are needed as literals, the following named entities MUST be used:

    * &amp; for & (ampersand)
    * &lt; for < (left angle bracket, less-than sign)
    * &gt; for > (right angle bracket, greater-than sign)
    * &quot; for " (quotation mark)
    * &apos; for ' (apostrophe)

Regards,

archang3l
0
 
Geert BormansCommented:
> XML is UTF-8

this is not true
XML can be UTF-8, but doesn't necessarily have to be.
CDATA sections make that you don't have to escape the famous five,
but they don't make illegal characters legal.

You can set the encoding that was used in the XML through the xml declaration
example <?xml version="1.0" encoding="ISO-8859-1" ?>
indicates that the encoding used is ISO-8859-1 (iso latin 1)
If the declaration is left out, the default is UTF-8 and UTF-8 is most commonly used
You really have to find out what encoding your perl script is generating
and mention that in the declaration
Why don't you test by adding this in the front of your XML
<?xml version="1.0" encoding="ISO-8859-1" ?>
it might already work

If you don't find the correct encoding
(this could be the case if you are merging data from texts, databases etc, and you are dealing with a mixed encoding)
you could add filters that map certain characters to the unicode number, like this &#233; for "é"
This number is correct, regardless of the encoding

By the way
XML is Unicode, the encoding is just the binary representation, UTF-8 is simply such a repreentation, but there are many
I think that is what archang3l meant to say

cheers

Geert

0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now