Unrecognised XML characters

When running my XML file through Altova XMLSpy i get this error when it view it:

Your file contains 10 characters it should not, these are

#150; (0x96), #146; (0x92)

Does anyone know what chars these are?

I am using xml version='1.0' encoding='iso-8859-1'.

Is that the correct encoding type for English?

TIA.

Picco
crmpiccoAsked:
Who is Participating?
 
Geert BormansConnect With a Mentor Information ArchitectCommented:
Hi,

Is this the right encoding?
I would say yes. It is Iso Latin 1, I think still used in most applications and though the XML recommendation doesn't force a parser to support it, most of them do.

You can see the list at
http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/charsets/charset1.asp
and then you will see that 128 to 156 are not supported

You would run into the same problem using "UTF-8" because UTF-8 just copies Iso Latin in the 1 byte range.

The encoding "windows-1252" is exactly the same, but uses the space 128 - 156 for some extra characters. eg. the Euro Sign is in that space. That is exactly the reason why so many people are using this Windows version of Iso Latin 1 (called Windows Latin 1)
It is supported by XML SPy and windows parsers. So it is OK for use in a Windows centric environment. Beware for export though.
http://support.microsoft.com/default.aspx?scid=kb;en-us;197368#kb1

There is something now, called :encoding='iso-8859-15' that adds some extra characters, also the Euro-sign to Iso Latin 1 and apparently it is supported in XML-Spy. I don't have too many details.

I don't know how you want to render the characters in the end. But if you are only concerned about correct storage, then I would go for a preprocessing step. Use some Regular Expression tool to walk through the XML and replace the characters you mentioned by – and ’ provided they are correct in the unicode standard. Or find the exact meaning in the unicode tables.
For reference you can go to www.unicode.org.

If the only tool in your toolbox is Spy, I asume you can twiddle with some scripting onLoad.

I hope this helps. If you have more questions, I am happy to help

Gertone

 
0
 
LandyJCommented:
#150 (0x96) is a ` (a grave accent or upper left to lower right apostrophy)
#146 (0x92) is a \  (backslash)

These are system identifiers to XML and need to be escaped if they are used in your data.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.