Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Processing UTF-16 encoded xml file

Posted on 2004-03-30
4
Medium Priority
?
1,529 Views
Last Modified: 2008-02-01
I am using msxml 4 to load a xml file into DOM tree. The problem is, the XML file is not well formed as it:

- does not specify the encoding
- does not have a byte-order mark at the beginning of the document
- contains UTF-16 encoded data

As a result, I got a "invalid character found" error when building the DOM tree. As I have no control on the XML generation side to correct this problem, I ened a workaround. One way I can think of is to dynamically insert the encoding into the xml file, but I am looking for a better way. Is there any option in msxml to specify a "default encoding" in case no encoding is specified? I am looking for something like that:

document->load("myfile.xml", Encoding::UTF-16);

Any helps is appreciated.
0
Comment
Question by:onlygo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 12

Accepted Solution

by:
dfiala13 earned 500 total points
ID: 10720861
You can try this:
Create a new XML document, then add a processing instruction with the proper character type.

var pi = xmldoc.createProcessingInstruction("xml"," version='1.0' encoding='UTF-16'");
xmldoc.appendChild(pi);
xmldoc.save("newfile.xml")

then load in the suspect XML using LoadXML

xmldoc.LoadXML(sXML)

Here's ain interesting link on encoding and MSXML

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/xmlencodings.asp

0
 
LVL 26

Assisted Solution

by:rdcpro
rdcpro earned 500 total points
ID: 10721422
That doesn't seem like it will work to me.  When you loadXML() or load(), it blows away anything that was there.  I would be surprised if the previous encoding persisted.

HOWEVER, the loadXML method presumes UTF-16, so using a string-based load rather than the IStream version will probably work by itself.  But if necessary, you can prepend the PI too:

xmldoc.LoadXML("<?xml version='1.0' encoding='utf-16' ?>" + sXML)

With newer versions of MSXML, you can even force UTF-8 encoding by specifying utf-8 in the PI.  Note, however, that the character data in the loadXML still must be UTF-16, because all strings are BStr, which is essentially UTF-16.  It's also worth noting that if you use the xml property, as in:

strXml = xmlDoc.xml

then the data is UTF-16 and there will be no byte order mark either!  But it doesn't matter, unless the byte order is odd anyway.

Summary:

don't use the IStream-based load() method.  Use the string-based (ie: UTF-16) loadXML method.

Regards,
Mike Sharp

0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question