Hi! take a look at
http://www.w3schools.com/x
as you will see, portions of xml inside a cdata section will be ignored by the parser. Maybe this can helpful for you
Main Topics
Browse All TopicsHi all,
I am developing an application using XML to talk with legacy backend (mainframe), some old mainframe application may use some invalid characters (e.g. low value x'00') such that when these characters are exist in the tag value, the xml parser will throw exception.
My question is, how many characters in XML are regarded as "invalid"? I've found some information in W3C (http://www.w3.org/TR/REC-
"
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
"
But I'm not quite understand for the above description. What is the corresponding Hex value ? Can anyone help to give me a brief explanation?
Thanks a lot!
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Hi! take a look at
http://www.w3schools.com/x
as you will see, portions of xml inside a cdata section will be ignored by the parser. Maybe this can helpful for you
Business Accounts
Answer for Membership
by: GertonePosted on 2005-11-05 at 07:05:11ID: 15230977
Your question addresses multiple topics.
lic/4.1.0/ charts/Cod eCharts.pd f (30 MegaByte)
I ll try to give a brief explanation of these topics
in an XML stream are the characters allowed that are mentioned in the above definition
#x9 is the ninth character as defined in Unicode
The codechars you can find here
http://www.unicode.org/Pub
Not allowed in an XML stream are the first 32 characters in this set
(including NUL(0), excluding "tab"(9), "linefeed" (10) and "carriage return"(13))
This means 29 characters are not allowed
Then there are a number of characters not allowed in the higher blocks starting at character 55296
You can count them your self from the above definition of Char.
The codecharts are just a mapping between a character and a numerical representation.
How they are stored on the computer depends on the encoding.
At this point, number of bits/bytes and byte order come into play
When you ask "what is the corresponding Hex value"? I assume you want the encoding question answered. Well this depends on the encoding.
At the start of your XML document you can have this string
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
The ISO-8859-1 encoding is the ISO standardised Latin one (Ascii). That characterset encodes 1byte, so 255 characters and these 255 characters byte numbers match the first part of the unicode standard, so that is easy
This means that with the iso latin encoding you cannot express a "h with a ^" (character 293 or x125) except with a character entity, like this "ĥ" or "ĥ" (using 6/7 bytes)
If this xml declaration is not present, UTF-8 is assumed as encoding.
UTF8 uses one byte for the first 127 characters (exact match to ascii as well) but uses two bytes for the next part (one special code byte plus another for counting:
in iso latin 1 "é" would be one byte, 233
in UTF-8, "é" would be two bytes
If you are pulling information streams from a mainframe. You have to know the meaning of each byte value on the mainframe, map it to the encoding you pick for the XML
and remove or escape the characters that are not allowed
On a higher level an XML stream consists of markup and character data.
Since markup uses some special characters "<", ">" and "&". These need to be escaped.
A "<" or a "&" in a character data part makes the XML unvalid as well...
I hope this is a start.
Happy to provide you with more info if required
Geert
here you need to escape "<" by "<" and "&" by "&"