C# xml error hexadecimal value 0x03, is an invalid character

I'm trying to run the following code, problem there appears to be a special character in the XML file that causes the following error.

' ', hexadecimal value 0x03, is an invalid character. Line 2597, position 73.

Is there any way to skip invalid characters so that the file can still saved.  And if so how
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string[,] feedList = new string[, ]
            {            
                {"http://fredericksburg.craigslist.org/gms/index.rss", "fredericksburg"}
            };
           
            XmlDocument doc = new XmlDocument();

            for (int i = 0; i <= feedList.GetUpperBound(0); i++)
            {
                doc.Load(feedList[i, 0]);
                XmlDeclaration dec = doc.FirstChild as XmlDeclaration;
                if (dec != null)
                {
                    dec.Encoding = "UTF-8";
                }
                doc.Save("C:/test.xml");
            }
        }
    }
}
slightstkAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SawinerCommented:
You can try to use the XmlWriterSettings.CheckCharacters property (set it to false). However note:

msdn quote: "Character checking does not include checking for illegal characters in XML names, nor does it include checking that all XML names are valid. These checks are part of conformance checking and are always performed."

In order to use it:
XmlWriterSettings xws = new XmlWriterSettings();
xws.CheckeCharacters = false;
xws.Indent = true;

XmlWriter output = XmlWriter.Create("C:/test.xml", xws);
doc.Save(output);

Either way you shouldn't want to save an illegal xml.
If you still do want to ,you might want to think about just loading the whole data and saving it as string, instead of xml document.

Good luck.
0
anarki_jimbelSenior DeveloperCommented:
I believe the best option is to "sanitize" the xml you are trying to save. Have a look on the snippet below - it works for your case. I xml text from a file.

See also for additional ideas"

http://prettycode.org/2009/05/07/hexadecimal-value-0x-is-an-invalid-character/
        private void button1_Click(object sender, EventArgs e)
        {

            try
            {
                char ch_03 = '\x03';

                StringBuilder sb = new StringBuilder();

                // Create an instance of StreamReader to read from a file.
                // The using statement also closes the StreamReader.
                using (StreamReader sr = new StreamReader("index.txt"))
                {
                    String line;
                    // Read and display lines from the file until the end of 
                    // the file is reached.
                    while ((line = sr.ReadLine()) != null)
                    {
                        sb.AppendLine(line.Replace(ch_03, ' '));
                    }
                }
             
                XmlDocument doc = new XmlDocument();
                doc.LoadXml(sb.ToString());

                XmlDeclaration dec = doc.FirstChild as XmlDeclaration;
                if (dec != null)
                {
                    dec.Encoding = "UTF-8";
                }
                doc.Save("test.rss");
            }
            catch (Exception ex)
            {

                MessageBox.Show(ex.ToString());
            }
        }

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Miguel OzSoftware EngineerCommented:
Use XmlTextReader to read your url xml feed.

Implement this code in your loop:

XmlTextReader rssReader = new XmlTextReader((feedList[i, 0]));
XmlDocument rssDoc = new XmlDocument();
// Load the XML content into a XmlDocument
rssDoc.Load(rssReader);
rssDoc.Save(@"c:\temp\test.xml");
0
anarki_jimbelSenior DeveloperCommented:
I believe it is a good idea to test your suggestions before you give an advice.

The suggestion from mas_oz2003 didn't work. I new that before I tried it. Obviously, if char 0x03 is not valid in XML - XmlReader WILL throw an exceptions. Simple!

Nevertheless, I spend 5 mins to test the proposal - I was right, this does not work. So, no point for slightstk to waste time trying the idea again.
0
slightstkAuthor Commented:
Great solution anarki_jimbel, once the file was downloaded using the webclient utitliy, shown below.  This utility to "clean" the XML code work perfectly.
WebClient wc = new WebClient();
wc.DownloadFile(URI, "C:/test.xml");
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.