Go Premium for a chance to win a PS4. Enter to Win


Japanese breaks up XML (msxml2)

Posted on 2003-11-30
Medium Priority
Last Modified: 2013-11-19
Hi there,

I have the following problem: I wrote a simple ASP script (see function below) that allows administrators of my website to update text strings in an XML file. This works perfectly for any unicode language, but not for Japanese. The file becomes totally corrupted and unreadable; in fact, is "cut" at the exact point where it should be normally updated.

I suppose the problem lies either in the fact that I use an old msxml parser (ver. 2) or because the Japanese encoding is set to shift-jis i.s.o. UTF-8.

Does anyone here have any experience with this problem, and a possible solution?


Function fncUpdateXML(strLanguage, strScriptName, strNode, strText)
      Set xmlDoc = Server.CreateObject("msxml2.DOMDocument")
      xmlDoc.async = False
      If NOT xmlDoc.Load("c:\testfile.xml") Then
            Response.Write "Page failed to load"
            strText = Replace(strText, "<br>", vblf)
            xmlDoc.SelectSingleNode("/languages/language[@xml:lang='" & strLanguage & "']/pages/page[@xml:page='" & strScriptName & "']/" & strNode).text = strText
      xmlDoc.save "c:\testfile.xml"
      End If
      Set xmlDoc = Nothing
End Function
Question by:vpikula
  • 2
  • 2
LVL 26

Expert Comment

ID: 9848729
No, you're using MSXML 3.  The ProgID  "Msxml2.DomDocument" doesn't mean MSMXML version 2...it's more like version 2 of the API.  MSXML version 2 had a different API (this was before things were settled at the W3C).  MSXML 4 uses a similar ProgID: "Msxml2.DomDocument.4.0"

Any time you have strings involved, your encoding is actually UTF-16.  This is because BStr's are essentially UTF-16.  So somewhere along the road, your encoding is getting goofed up.  Now, if your document contains Unicode characters, and you were to try to insert shift-jis characters in it, you'd have a problem as the document can only have one encoding.  If you can convert the submitted Shift-JIS to unicode, that would be easiest.  

also, xml:page does not look like correct useage.  There isn't (to my knowledge) any such thing as an xml:page attribute.  "xml" is a reserved key, and you should avoid using it in your own semantic context.  Also, to reliably select nodes with a qualified name (such as foo:bar), you need to specify the selectionNamespaces property in your DomDocument object.  Same goes for using XPath.  selectSingleNode() defaults to the old XSL patterns language for backwards compatibility reasons.

Here's how to set them both (in JScript...sorry):

var xmldoc = new ActiveXObject("Msxml2.DOMDocument");
xmldoc.setProperty("SelectionLanguage", "XPath");
xmldoc.setProperty("SelectionNamespaces", "xmlns:foo='http://myserver.com' xmlns:bar='http://yourserver.com'");

This allows you to select a node using a qualified name, even if the actual prefix is different than the one in your SelectionNamepaces property.  For example, this XML:

<snafu:rootelement xmlns:snafu="http://tempuri.org">nice root element</snafu:rootelement>

can be selected by:

var xmldoc = new ActiveXObject("Msxml2.DOMDocument");
xmldoc.setProperty("SelectionLanguage", "XPath");
xmldoc.setProperty("SelectionNamespaces", "xmlns:foo='http://tempuri.org'");
var oNode = xmldoc.selectSingleNode("foo:rootelement")

even though the prefix in the XML is "snafu" and the prefix in the select is "foo".  It's only the namespace that counts.

Mike Sharp



Author Comment

ID: 9849027
Thanks a lot for that detailed info, rdcpro! I'll repair the xml with the syntax pointers you gave once this problem is solved.

Your suggestion is: " If you can convert the submitted Shift-JIS to unicode, that would be easiest."

How do I do this? Right now the pages I present to my users to edit the XML on are encoded in Shift-JIS.  I could easily set these to be UTF-8 (the doc is in UTF-8) so there is no problem. But at the output end, I *have* to display the same text in Shift-JIS; the devices accessing the site are mobile phones that can only accept this.

So in short, I see two solutions:
1) I use a seperate XML doc, encoded in Shift-JIS
2) I let admins post in UTF-8, but convert the output to Shift-JIS

If you have a good solution for 2), I'll do that. Otherwise, I'll go for 1) and make a seperate document for my Japanese texts.

LVL 26

Accepted Solution

rdcpro earned 2000 total points
ID: 9850400
How do you serve the content for your site to the users.  By any chance, do you render the XML using XSLT?

You might have to use approach 1, but XSLT does have a nice method for producing different output encodings regardless of what the XML is encoded in.  The tag:

<xsl:output method="xml" encoding="shift-jis"/>

causes all output to be encoded in the desired encoding.  MSXML supports any encoding supported by Internet Explorer.  However, you can't dynamically specify the encoding at runtime (at least not elegantly).  You'd need at least a separate root XSLT for each encoding, and then use the appropriate one at runtime.  Each XSLT would import or include all it's templates, you you wouldn't really have any redundant code.  Something like:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" version="1.0" encoding="shift-jis" indent="yes"/>
      <xsl:include href="myTemplates.xslt"/>

On the other hand, if you're not using XSLT, and don't want to, you can probably use the stream object.  I believe you can set various encodings on it.  I don't have a code sample, though, as I usually end up using XSLT.  

It looks like to me that your XML file holds the content for all pages on the site, for all supported languages.  This sounds like a pretty big file, and parsing the entire thing isn't the best use of resources, I should think, considering a single site visitor will only use one locale.  There are a variety of approaches for localization...you might think about using a different approach.  For example, localizable resources go in a separate XML file for each locale, stored in a separate folder:

    |_    en_US
    |_    fr_CA

When you discover the site visitors locale or culture code, you modify the path to the XML resource, and cache the content in the user's session, or load it each time, depending on your needs.

.NET has a better way of dealing with localizable resources, too.

Mike Sharp

Author Comment

ID: 9851121
Thanks very much Mike -- that seperate folder solution (or different filenames) will work perfectly for me. No; I do not use XSLT (since I don't really understand it heh) so seperate files works best.

Great job-500 points coming your way!

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface This article introduces an authentication and authorization system for a website.  It is understood by the author and the project contributors that there is no such thing as a "one size fits all" system.  That being said, there is a certa…
Preface In the first article: A Better Website Login System (http://www.experts-exchange.com/A_2902.html) I introduced the EE Collaborative Login System and its intended purpose. In this article I will discuss some of the design consideratio…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question