Dealing with special characters in XML files read by vb script

Cerixus
Cerixus used Ask the Experts™
on
I have a script that runs that imports a bunch of XML files.  One of the elements it grabs frequently has special characters (for example, "ç").  When the import script processes this XML file, it errors out with the following message:

Error #-1072896760  An invalid character was found in text content.

Is there any way to catch this on the import?  I know on the export I could manually replace "ç" with "c", but I don't want to do that for every character.  I'd even be happy with catching them all on the export, though would prefer doing it on the import.  The readelement function is attached.  Any tips would be appreciated.
Public Function ReadElement(element)
	on error resume next
	Set objNodeList = xmlDoc.getElementsByTagName(element) 
	
	If objNodeList.length > 0 then 
		For each x in objNodeList 
			ReadElement = x.Text
		Next 
	Else 
		ReadElement = ""
	End If
	if Err.Number <> 0 then 
		errornumber = Err.Number
		errormessage = Err.Description
		logError WMIGetLocalName(),errornumber,errormessage,"ReadElement()"
		Err.Clear
	end if	
end function

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
Oh, maybe something like getting the ascii value and seeing if it falls outside of a certain range?
I am guessing that xmlDoc is declared as CreateObject("Microsoft.XMLDOM").  If so, then it has certain conditions it requires to consider data XML compliant.  It can be unforgiving if it it does not consider the data compliant.  To handle that kind of data you will need to adjust it before loading it into the Document object.  

I suggest loading the file with the FileSystemObject.

Dim objFSO, objTextFile, strXMLData
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile ("c:\MyFile.xml", 1) ' 1 is for Reading
strXMLData = objTextFile.ReadAll
objTextFile.close
Then use regular expresions to update search and replace all the characters that are causing you problems  (See http://msdn.microsoft.com/en-us/library/ms974570.aspx for more details if you are unfamiliar)

Set objRegExp = CreateObject("VBScript.RegExp")
   
' Set pattern - Any non word character same as [^a-zA-Z0-9]
objRegExp.Pattern = "[ç]"
' Do a search and replace on the entire string
objRegExp.Global = True
   
strXMLData = objRegExp.Replace(strXMLData, "C")
Then load the new string into the DomDocument
xmlDoc.LoadXML strXMLData

-Bear

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial