Advertisement

11.05.2005 at 03:21AM PST, ID: 21620392
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

Invalid characters in XML

Tags: xml, invalid, characters
Hi all,

I am developing an application using XML to talk with legacy backend (mainframe), some old mainframe application may use some invalid characters (e.g. low value x'00') such that when these characters are exist in the tag value, the xml parser will throw exception.

My question is, how many characters in XML are regarded as "invalid"? I've found some information in W3C (http://www.w3.org/TR/REC-xml/#NT-Char). It said the following:

"
Character Range
[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
"

But I'm not quite understand for the above description.  What is the corresponding Hex value ?  Can anyone help to give me a brief explanation?


Thanks a lot!


Start your free trial to view this solution
Question Stats
Zone: Web Development
Question Asked By: DASAIS01UK
Solution Provided By: Gertone
Participating Experts: 2
Solution Grade: A
Views: 322
Translate:
Loading Advertisement...
11.05.2005 at 07:05AM PST, ID: 15230977

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.07.2005 at 06:24AM PST, ID: 15238897

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.07.2005 at 06:59AM PST, ID: 15239194

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.07.2005 at 07:27AM PST, ID: 15239442

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.07.2005 at 04:48PM PST, ID: 15244294

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
 
Loading Advertisement...
Microsoft
  • Internet Protocols
  • Applications
  • Development
  • OS
  • Hardware
  • Windows Security
Apple
  • Operating Systems
  • Hardware
  • Programming
  • Networking
  • Software
Internet
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Spy / Ad Blockers
  • Web Browsers
  • New Net Users
  • Web Development
  • Chat / IM
  • Anti Spam
  • Web Servers
  • Anti-Virus
  • Email Clients
Gamers
  • Tips
  • Online / MMORPG
  • Puzzle
  • Emulators
  • Action / Adventure
  • Role Playing
  • Consoles
  • Game Programming
  • Strategy
  • Sports
  • Misc
  • Computer Games
Digital Living
  • Hardware
  • New Net Users
  • New Users
  • Software
  • Digital Music
  • Gaming World
  • Home Security
  • Apple
  • Networking Hardware
Virus & Spyware
  • Vulnerabilities
  • IDS
  • Encryption
  • Anti-Virus
  • Operating Systems Security
  • Software Firewalls
  • WebApplications
  • Cell Phones
  • Operating Systems
  • Internet
  • Hardware Firewalls
Hardware
  • Handhelds / PDAs
  • Displays / Monitors
  • Components
  • Networking Hardware
  • Peripherals
  • Laptops/Notebooks
  • Storage
  • Servers
  • Desktops
  • New Users
  • Misc
  • Apple
Software
  • System Utilities
  • Industry Specific
  • Network Management
  • Photos / Graphics
  • Page Layout
  • VMWare
  • Misc
  • Web Development
  • OS
  • CYGWIN
  • Voice Recognition
  • Message Queue
  • Quality Assurance
  • Security
  • Firewalls
  • MultiMedia Applications
  • Development
  • Database
  • Office / Productivity
  • Business Management
  • OS/2 Apps
  • Server Software
  • Internet / Email
ITPro
  • OS
  • Storage
  • Encryption
  • Operating Systems Security
  • Apple Hardware
  • Laptops & Notebooks
  • Servers
  • Networking Hardware
  • Peripherals
  • Devices
  • Displays / Monitors
  • WebTrends / Stats
  • Search Engines
  • Firewalls
  • WebApplications
  • IDS
  • Vulnerabilities
  • Email Clients
  • File Sharing
  • Spy / Ad Blockers
  • Web Browsers
  • Web Servers
  • Networking
  • Anti-Virus
  • Chat / IM
  • Anti Spam
Developer
  • Web Servers
  • Web Browsers
  • Game Programming
  • Dev Tools
  • Industry Specific
  • Office / Productivity
  • Database
  • CYGWIN
  • Web Development
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Programming
  • Content Management
  • Application Servers
  • Protocols
Storage
  • Removable Backup Media
  • Storage Technology
  • Servers
  • Grid
  • Remote Access
  • Backup / Restore
  • Misc
  • Hard Drives
OS
  • Miscellaneous
  • Security
  • Development
  • Linux
  • VMWare
  • MainFrame OS
  • Unix
  • Apple
  • OS / 2
  • AS / 400
  • BeOS
  • Microsoft
  • VMS / OpenVMS
Database
  • Oracle
  • Miscellaneous
  • MySQL
  • Software
  • Sybase
  • Contact Management
  • PostgreSQL
  • Data Manipulation
  • Clarion
  • InterSystems Cache
  • Siebel
  • MUMPS
  • OLAP
  • SQLBase
  • SAS
  • GIS & GPS
  • 4GL
  • Berkeley DB
  • DB2
  • Informix
  • Interbase / Firebird
  • FoxPro
  • Reporting
  • LDAP
  • Filemaker Pro
  • MS SQL Server
  • dBase
  • MS Access
Security
  • Misc
  • Web Browsers
  • Software Firewalls
  • Operating Systems Security
  • File Sharing
  • Spy / Ad Blockers
  • Vulnerabilities
  • WebApplications
  • IDS
  • Anti-Virus
  • Encryption
  • Anti Spam
  • Email Clients
  • VPN
  • Chat / IM
Programming
  • Editors IDEs
  • Installation
  • Handhelds / PDAs
  • Multimedia Programming
  • System / Kernel
  • Algorithms
  • Game
  • Signal Processing
  • Project Management
  • Open Source
  • Database
  • Misc
  • Languages
  • Processor Platforms
  • Theory
Web Development
  • Scripting
  • Blogs
  • Web Servers
  • Software
  • Search Engines
  • Web Graphics
  • Images
  • Internet Marketing
  • Images and Photos
  • Components
  • Document Imaging
  • Web Languages/Standards
  • Illustration
  • WebApplications
  • Fonts
  • WebTrends / Stats
  • Authoring
  • Digital Camera Software
  • Miscellaneous
Networking
  • Protocols
  • Apple Networking
  • Network Management
  • Message Queue
  • Application Servers
  • Content Management
  • File Servers
  • Email Servers
  • Misc
  • Java Editors & IDEs
  • Wireless
  • Networking Hardware
  • Backup / Restore
  • System Utilities
  • ISPs & Hosting
  • Web Servers
  • Storage Technology
  • Removable Backup Media
  • Servers
  • Broadband
  • Grid
  • OS / 2
  • Novell Netware
  • Unix Networking
  • Windows Networking
  • Security
  • Telecommunications
  • Operating Systems
  • Linux Networking
Other
  • Community Advisor
  • Lounge
  • Community Support
  • New Net Users
  • Philosophy / Religion
  • Math / Science
  • Miscellaneous
  • URLs
  • Expert Lounge
  • Politics
  • Puzzles / Riddles
Community Support
  • Suggestions
  • New to EE
  • New Topics
  • Community Advisor
  • CleanUp
  • Announcements
  • General
  • Feedback
  • Input
  • EE Bugs
 
11.05.2005 at 07:05AM PST, ID: 15230977

Rank: Genius

Your question addresses multiple topics.
I ll try to give a brief explanation of these topics

in an XML stream are the characters allowed that are mentioned in the above definition
#x9 is the ninth character as defined in Unicode
The codechars you can find here
http://www.unicode.org/Public/4.1.0/charts/CodeCharts.pdf (30 MegaByte)

Not allowed in an XML stream are the first 32 characters in this set
(including NUL(0), excluding "tab"(9), "linefeed" (10) and "carriage return"(13))
This means 29 characters are not allowed
Then there are a number of characters not allowed in the higher blocks starting at character 55296
You can count them your self from the above definition of Char.

The codecharts are just a mapping between a character and a numerical representation.
How they are stored on the computer depends on the encoding.
At this point, number of bits/bytes and byte order come into play

When you ask "what is the corresponding Hex value"? I assume you want the encoding question answered. Well this depends on the encoding.

At the start of your XML document you can have this string
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
The ISO-8859-1 encoding is the ISO standardised Latin one (Ascii). That characterset encodes 1byte, so 255 characters and these 255 characters byte numbers match the first part of the unicode standard, so that is easy

This means that with the iso latin encoding you cannot express a "h with a ^" (character 293 or x125) except with a character entity, like this "&#x125;" or "&#293;" (using 6/7 bytes)

If this xml declaration is not present, UTF-8 is assumed as encoding.
UTF8 uses one byte for the first 127 characters (exact match to ascii as well) but uses two bytes for the next part (one special code byte plus another for counting:
in iso latin 1 "é" would be one byte, 233
in UTF-8, "é" would be two bytes

If you are pulling information streams from a mainframe. You have to know the meaning of each byte value on the mainframe, map it to the encoding you pick for the XML
and remove or escape the characters that are not allowed

On a higher level an XML stream consists of markup and character data.
Since markup uses some special characters "<", ">" and "&". These need to be escaped.
A "<" or a "&" in a character data part makes the XML unvalid as well...

I hope this is a start.
Happy to provide you with more info if required

Geert
here you need to escape "<" by "&lt;" and "&" by "&amp;"
Accepted Solution
 
11.07.2005 at 06:24AM PST, ID: 15238897
Hi! take a look at

http://www.w3schools.com/xml/xml_cdata.asp

as you will see, portions of xml inside a cdata section will be ignored by the parser. Maybe this can helpful for you
 
11.07.2005 at 06:59AM PST, ID: 15239194

Rank: Genius

Dasaiso1uk,

I just sam my last line should be moved 3 lines up, editor mistake

Pescatera,

CDATA sections will not help for for characters not allowed in the XML datastream. Not allowed means not allowed
CDATA sections only help for the higher level escape, as I talked about in my last paragraph

cheers
 
11.07.2005 at 07:27AM PST, ID: 15239442
Hi, dasa not only you've been nasty, but a little stupid too. In no place of your answer you said anything about cdata sections, and i only thougth it could help, sorry that my comment wasn't as brilliant as yours.
 
11.07.2005 at 04:48PM PST, ID: 15244294
Thanks for your clear explanation!
 
 
20080236-EE-VQP-29