C# XmlWriter produces file header with special characters

Ok, simple question.

I have a small block of code that creates some XML using XmlWriter in C# (running .NET 4.0.21006.0).

The code creates a simple XML file, but the header is preceded by what looks like 3 special characters.  (When I type the file in DOS, it looks like a Union symbol followed by a double upper right boarder followed by a single upper right boarder).  If I edit the file in something like Notepad or Notepad++, I do not see these characters, but they're there when I type the file from the command line.

I am using an XmlWriterSettings object with default settings.  Is this the problem?  The rest of the XML file appears to be perfectly fine.

The code snippet provided below shows the characters "n++" instead of the three characters; for some reason when I pasted them into the experts exchange website it did this.

How can I fix this?


// trimmed down from the actual code
// (eg- try/catch blocks removed):

string filename = "something.xml";
XmlWriter writer;
XmlWriterSettings settings;
settings = new XmlWriterSettings();
settings.Indent = true;
settings.NewLineChars = "\r\n";
writer = XmlWriter.Create(filename, settings);
writer.WriteStartDocument(true);        // header

writer.WriteStartElement("garden");
writer.WriteAttributeString("total", "1");
writer.WriteElementString("vegetable", "carrot");
writer.WriteEndElement();
writer.Close();

// ----------------------------
// Produces this result:

n++<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<garden total="1">
  <vegetable>carrot</vegetable>
</garden>

Open in new window

coder1313514512456Asked:
Who is Participating?
 
feenixCommented:
The three characters are called byte order mark and they are enabled by default in UTF-8 encoding object. So just use the following and you'll get rid of them.
settings.Encoding = new UTF8Encoding(false);

Open in new window

0
 
dukestaTAICommented:
Use an XmlTextWriter and set the encoding:

XmlTextWriter writer = new XmlTextWriter(filename, System.Text.Encoding.UTF8);
0
 
lazyberezovskyCommented:
Code you provided should not write any "n++" to output file. Below is same refactored code. And check what else are you doing with this file.
string filename = "something.xml";
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.NewLineChars = Environment.NewLine;

using (XmlWriter writer = XmlWriter.Create(filename, settings))
{
    writer.WriteStartDocument(true);
    writer.WriteStartElement("garden");
    writer.WriteAttributeString("total", "1");
    writer.WriteElementString("vegetable", "carrot");
    writer.WriteEndElement();
}

Open in new window

0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
coder1313514512456Author Commented:
It would appear that both suggestions have the same effect:  they write those first 3 characters, and then do the rest of the xml.
In the case of the XmlTextWriter, I don't get the indenting, probably because I'm not using a constructor that has some kind of settings class, however I haven't seen one of those either.
 
Not sure what to do.  Any suggestions?  Again, when I use something like notepad I don't see these preceding 3 characters (they look like "n++" on this website, but in the console look quite different.
Thanks again for the suggestions.
 
0
 
coder1313514512456Author Commented:
feenix, that would be the answer!  Thanks!
 
What IS that thing, anyway?  byte order marks?  Whose idea was this?  And this is for...???
 
Thanks again feenix, you get my thanks (and points)!
 
0
 
coder1313514512456Author Commented:
Why Microsoft made it so this is the default I will have no idea.  Thanks feenix, perfect.  And thanks to others, I just really needed to get to the bottom of this.

0
 
feenixCommented:
The byte order mark is there to tell the reading program if the data is in big or little endian format. It's not usually needed in UTF-8, but in UTF-16 it might be usable. The characters are selected so that they are probably never used together in a normal text file (from different languages etc) so there won't be any problems in detection.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.