Solved

XML Exception: Invalid Character(s)

Posted on 2009-05-12
4
1,399 Views
Last Modified: 2013-11-11
I am working on a small project that is receiving XML data in string form from a long running application. I am trying to load this string data into an XDocument (System.Xml.Linq.XDocument), and then from there do some XML Magic and create an xlsx file for a report on the data.

On occasion, I receive the data that has invalid XML characters, and when trying to parse the string into an XDocument, I get this error.

[System.Xml.XmlException]
Message: '?', hexadecimal value 0x1C, is an invalid character.

Since I have no control over the remote application, you could expect ANY kind of character.

I am well aware that XML has a way where you can put characters in it such as &#x1C or something like that.

If at all possible I would SERIOUSLY like to keep ALL the data. If not, than let it be.


---

I have thought about editing the response string programatically, then going back and trying to re-parse should an exception be thrown, but I have tried a few methods and none of them seem successful.

Thank you for your thought.
TextReader  tr;

XDocument  doc;

string           response; //XML string received from server.
 

...
 

tr = new StringReader (response);
 

try

{

     doc = XDocument.Load(tr);

}

catch (XmlException e)

{

    //handle here?

}

Open in new window

0
Comment
Question by:Meiscooldude
  • 2
  • 2
4 Comments
 
LVL 6

Accepted Solution

by:
ViceroyFizzlebottom earned 500 total points
ID: 24367911
Here is something I did a while ago when faced with the same issue. Basically, read in the data as plain text, manipulate it how you want to get it massaged, then load that into your XML doc.
                using (StreamReader reader = _xmlCatalogFile.OpenText())

                {

                    string strRawData = reader.ReadToEnd();

                    reader.Close();
 

                    // Replace malformed data

                    Regex badAmpersand = new Regex("&(?![a-zA-Z]{2,6};|#[0-9]{2,4};)");

                    const string goodAmpersand = "&";

                    strRawData = badAmpersand.Replace(strRawData, goodAmpersand);
 

                    _xmlDocument.LoadXml(strRawData);

                }

Open in new window

0
 
LVL 6

Expert Comment

by:ViceroyFizzlebottom
ID: 24367916
The regular expression above was simply to format '&' but the general idea can be used for anything.
0
 

Author Comment

by:Meiscooldude
ID: 24368086
Thank you for the hasty reply,

From what i can see, that will only replace an ampersand if it is in front of a hex char or something like 'gt;'

I am looking for a way to replace ALL invalid characters, such as 'G' with their corresponding &#hexvalue or simply removing it all together. (preferably keeping it)
0
 

Author Closing Comment

by:Meiscooldude
ID: 31580658
I used a method like this, thank you vm
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, just open a new email message. In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now