dominicwong
asked on
An XML and File import/export question with C#
Hi experts
I imported an xml file and save it to another xml file by LINQ in C#.
I then opened the files and print out the content byte-by-byte.
But when I compared the printed output, the file "output.xml" contains some additional characters: 239, 187, 191,
where 239: Latin small letter i with diaeresis
187: Right double angle quotes
191: Inverted question mark
It also dropped 32 (ie Space) that was in "input.xml".
My question is: Is there any way to preserve the format of the input without adding funny characters or discarding space character?
They look identical in a text editor though.
Thanks in advance.
I imported an xml file and save it to another xml file by LINQ in C#.
_xmlDocument = XDocument.Load("input.xml", LoadOptions.PreserveWhitespace);
_xmlDocument.Save("output.xml", SaveOptions.DisableFormatting);
I then opened the files and print out the content byte-by-byte.
using (FileStream fs = File.Open(pathAndFileName, FileMode.Open))
{
int size = (int)fs.Length;
byte[] data = new byte[size];
fs.Read(data, 0, size);
foreach (byte b in data)
Console.WriteLine(b);
}
But when I compared the printed output, the file "output.xml" contains some additional characters: 239, 187, 191,
where 239: Latin small letter i with diaeresis
187: Right double angle quotes
191: Inverted question mark
It also dropped 32 (ie Space) that was in "input.xml".
My question is: Is there any way to preserve the format of the input without adding funny characters or discarding space character?
They look identical in a text editor though.
Thanks in advance.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
FYI, the reason they need to be identical is because I need to calculate the CRC of the file later in the program.
thats true, however if u compare bytes of the two xmls u gonna find they are identical:
List<byte> xml1 = new List<byte>();
List<byte> xml2 = new List<byte>();
using (FileStream fs = File.Open(@"input.xml", FileMode.Open))
{
int size = (int)fs.Length;
byte[] data = new byte[size];
fs.Read(data, 0, size);
xml1.AddRange(data);
}
using (FileStream fs = File.Open(@"output.xml", FileMode.Open))
{
int size = (int)fs.Length;
byte[] data = new byte[size];
fs.Read(data, 0, size);
xml2.AddRange(data);
}
var countDiffBytes = xml1.Except(xml2).Count();
countDiffBytes is equal to 0, meaning they will pass crc check.
ASKER
Sorry for the confusion. The later CRC calculation hasn't been included in the code (for clarity reason).
The actual code when it comes to calculating CRC is as follows.
Therefore, I do need them to be completely identical; otherwise, the CRC will be different.
The actual code when it comes to calculating CRC is as follows.
Therefore, I do need them to be completely identical; otherwise, the CRC will be different.
Crc32 crc32 = new Crc32();
String hash = String.Empty;
using (FileStream fs = File.Open(pathAndFileName, FileMode.Open))
{
foreach (byte b in crc32.ComputeHash(fs))
hash += b.ToString("x2").ToLower();
}
ASKER
I managed to get the software requirement to change from "utf-8" to "us-ascii".
Now, it is OK. The problem is resolved.
Thanks for your help.
Now, it is OK. The problem is resolved.
Thanks for your help.
ASKER
Thank you.
ASKER
It resolves the funny character problem. But now it creates one issue:
The original file was:
<?xml version="1.0" encoding="utf-8"?>
Now, the saved file became:
<?xml version="1.0" encoding="us-ascii"?>