?
Solved

Processing UTF-16 encoded xml file

Posted on 2004-03-30
4
Medium Priority
?
1,519 Views
Last Modified: 2008-02-01
I am using msxml 4 to load a xml file into DOM tree. The problem is, the XML file is not well formed as it:

- does not specify the encoding
- does not have a byte-order mark at the beginning of the document
- contains UTF-16 encoded data

As a result, I got a "invalid character found" error when building the DOM tree. As I have no control on the XML generation side to correct this problem, I ened a workaround. One way I can think of is to dynamically insert the encoding into the xml file, but I am looking for a better way. Is there any option in msxml to specify a "default encoding" in case no encoding is specified? I am looking for something like that:

document->load("myfile.xml", Encoding::UTF-16);

Any helps is appreciated.
0
Comment
Question by:onlygo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 12

Accepted Solution

by:
dfiala13 earned 500 total points
ID: 10720861
You can try this:
Create a new XML document, then add a processing instruction with the proper character type.

var pi = xmldoc.createProcessingInstruction("xml"," version='1.0' encoding='UTF-16'");
xmldoc.appendChild(pi);
xmldoc.save("newfile.xml")

then load in the suspect XML using LoadXML

xmldoc.LoadXML(sXML)

Here's ain interesting link on encoding and MSXML

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/xmlencodings.asp

0
 
LVL 26

Assisted Solution

by:rdcpro
rdcpro earned 500 total points
ID: 10721422
That doesn't seem like it will work to me.  When you loadXML() or load(), it blows away anything that was there.  I would be surprised if the previous encoding persisted.

HOWEVER, the loadXML method presumes UTF-16, so using a string-based load rather than the IStream version will probably work by itself.  But if necessary, you can prepend the PI too:

xmldoc.LoadXML("<?xml version='1.0' encoding='utf-16' ?>" + sXML)

With newer versions of MSXML, you can even force UTF-8 encoding by specifying utf-8 in the PI.  Note, however, that the character data in the loadXML still must be UTF-16, because all strings are BStr, which is essentially UTF-16.  It's also worth noting that if you use the xml property, as in:

strXml = xmlDoc.xml

then the data is UTF-16 and there will be no byte order mark either!  But it doesn't matter, unless the byte order is odd anyway.

Summary:

don't use the IStream-based load() method.  Use the string-based (ie: UTF-16) loadXML method.

Regards,
Mike Sharp

0

Featured Post

AWS Certified Solutions Architect - Associate

This course has been developed to provide you with the requisite knowledge to not only pass the AWS CSA certification exam but also gain the hands-on experience required to become a qualified AWS Solutions architect working in a real-world environment.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question