Link to home
Start Free TrialLog in
Avatar of Meiscooldude
Meiscooldude

asked on

XML Exception: Invalid Character(s)

I am working on a small project that is receiving XML data in string form from a long running application. I am trying to load this string data into an XDocument (System.Xml.Linq.XDocument), and then from there do some XML Magic and create an xlsx file for a report on the data.

On occasion, I receive the data that has invalid XML characters, and when trying to parse the string into an XDocument, I get this error.

[System.Xml.XmlException]
Message: '?', hexadecimal value 0x1C, is an invalid character.

Since I have no control over the remote application, you could expect ANY kind of character.

I am well aware that XML has a way where you can put characters in it such as &#x1C or something like that.

If at all possible I would SERIOUSLY like to keep ALL the data. If not, than let it be.


---

I have thought about editing the response string programatically, then going back and trying to re-parse should an exception be thrown, but I have tried a few methods and none of them seem successful.

Thank you for your thought.
TextReader  tr;
XDocument  doc;
string           response; //XML string received from server.
 
...
 
tr = new StringReader (response);
 
try
{
     doc = XDocument.Load(tr);
}
catch (XmlException e)
{
    //handle here?
}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ViceroyFizzlebottom
ViceroyFizzlebottom
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The regular expression above was simply to format '&' but the general idea can be used for anything.
Avatar of Meiscooldude
Meiscooldude

ASKER

Thank you for the hasty reply,

From what i can see, that will only replace an ampersand if it is in front of a hex char or something like 'gt;'

I am looking for a way to replace ALL invalid characters, such as 'G' with their corresponding &#hexvalue or simply removing it all together. (preferably keeping it)
I used a method like this, thank you vm