Working with Unicode Characters in XML
Posted on 2012-09-11
I am working with a program called EnCase 7 that has a scripting engine. The scripting engine is a barebones implementation of C++ - from what I can tell. I am trying to write out an XML file from EnCase to load into C#. This XML file has data in EnCase that I want to move into .Net so I can do additional processing. Also, EnCase does not have a complement that creates XML files, but a standard file writer that I am using to create XML files. All the text files I am working with is UTF-16.
Most of the XML files that I create load into .Net with no issues. However, I am having issues with characters that .Net does not like, for example &, <, > and Unicode character \x01. This morning I found two more þ and . Below is a function I created to replace these characters with their correct HTML replacement.
String uft16CleanUp(String cleanString)
cleanString.Replace("\x01", " ", 0, -1);
cleanString.Replace(" ", " ", 0, -1);
cleanString.Replace("<", "<", 0, -1);
cleanString.Replace(">", ">", 0, -1);
cleanString.Replace("&", "&", 0, -1);
Does anyone have a better suggestion to go about this? I am currently “fishing” for characters and taking up a lot of time. I want to know if there is a predefined list of characters that XML needs converted or if there is a Unicode range that I should automatically convert.
Any help would be greatly appreciated.