Solved

XML Exception: Invalid Character(s)

Posted on 2009-05-12
4
1,409 Views
Last Modified: 2013-11-11
I am working on a small project that is receiving XML data in string form from a long running application. I am trying to load this string data into an XDocument (System.Xml.Linq.XDocument), and then from there do some XML Magic and create an xlsx file for a report on the data.

On occasion, I receive the data that has invalid XML characters, and when trying to parse the string into an XDocument, I get this error.

[System.Xml.XmlException]
Message: '?', hexadecimal value 0x1C, is an invalid character.

Since I have no control over the remote application, you could expect ANY kind of character.

I am well aware that XML has a way where you can put characters in it such as &#x1C or something like that.

If at all possible I would SERIOUSLY like to keep ALL the data. If not, than let it be.


---

I have thought about editing the response string programatically, then going back and trying to re-parse should an exception be thrown, but I have tried a few methods and none of them seem successful.

Thank you for your thought.
TextReader  tr;
XDocument  doc;
string           response; //XML string received from server.
 
...
 
tr = new StringReader (response);
 
try
{
     doc = XDocument.Load(tr);
}
catch (XmlException e)
{
    //handle here?
}

Open in new window

0
Comment
Question by:Meiscooldude
  • 2
  • 2
4 Comments
 
LVL 6

Accepted Solution

by:
ViceroyFizzlebottom earned 500 total points
ID: 24367911
Here is something I did a while ago when faced with the same issue. Basically, read in the data as plain text, manipulate it how you want to get it massaged, then load that into your XML doc.
                using (StreamReader reader = _xmlCatalogFile.OpenText())
                {
                    string strRawData = reader.ReadToEnd();
                    reader.Close();
 
                    // Replace malformed data
                    Regex badAmpersand = new Regex("&(?![a-zA-Z]{2,6};|#[0-9]{2,4};)");
                    const string goodAmpersand = "&";
                    strRawData = badAmpersand.Replace(strRawData, goodAmpersand);
 
                    _xmlDocument.LoadXml(strRawData);
                }

Open in new window

0
 
LVL 6

Expert Comment

by:ViceroyFizzlebottom
ID: 24367916
The regular expression above was simply to format '&' but the general idea can be used for anything.
0
 

Author Comment

by:Meiscooldude
ID: 24368086
Thank you for the hasty reply,

From what i can see, that will only replace an ampersand if it is in front of a hex char or something like 'gt;'

I am looking for a way to replace ALL invalid characters, such as 'G' with their corresponding &#hexvalue or simply removing it all together. (preferably keeping it)
0
 

Author Closing Comment

by:Meiscooldude
ID: 31580658
I used a method like this, thank you vm
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
"Emulate" TAB key when press Enter Key 3 47
TSQL Query Into Specific XML Format 3 24
C# MVC Insert Multiple Row into DB 2 31
Please explain purpose of GZIP 4 35
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question