Solved

Japanese breaks up XML (msxml2)

Posted on 2003-11-30
4
399 Views
Last Modified: 2013-11-19
Hi there,

I have the following problem: I wrote a simple ASP script (see function below) that allows administrators of my website to update text strings in an XML file. This works perfectly for any unicode language, but not for Japanese. The file becomes totally corrupted and unreadable; in fact, is "cut" at the exact point where it should be normally updated.

I suppose the problem lies either in the fact that I use an old msxml parser (ver. 2) or because the Japanese encoding is set to shift-jis i.s.o. UTF-8.

Does anyone here have any experience with this problem, and a possible solution?

Thanks,
Victor
 

'XML UPDATE NODE VALUE
Function fncUpdateXML(strLanguage, strScriptName, strNode, strText)
      Set xmlDoc = Server.CreateObject("msxml2.DOMDocument")
      xmlDoc.async = False
      If NOT xmlDoc.Load("c:\testfile.xml") Then
            Response.Write "Page failed to load"
            Response.End
      Else
            strText = Replace(strText, "<br>", vblf)
            xmlDoc.SelectSingleNode("/languages/language[@xml:lang='" & strLanguage & "']/pages/page[@xml:page='" & strScriptName & "']/" & strNode).text = strText
      xmlDoc.save "c:\testfile.xml"
      End If
      Set xmlDoc = Nothing
End Function
0
Comment
Question by:vpikula
  • 2
  • 2
4 Comments
 
LVL 26

Expert Comment

by:rdcpro
ID: 9848729
No, you're using MSXML 3.  The ProgID  "Msxml2.DomDocument" doesn't mean MSMXML version 2...it's more like version 2 of the API.  MSXML version 2 had a different API (this was before things were settled at the W3C).  MSXML 4 uses a similar ProgID: "Msxml2.DomDocument.4.0"

Any time you have strings involved, your encoding is actually UTF-16.  This is because BStr's are essentially UTF-16.  So somewhere along the road, your encoding is getting goofed up.  Now, if your document contains Unicode characters, and you were to try to insert shift-jis characters in it, you'd have a problem as the document can only have one encoding.  If you can convert the submitted Shift-JIS to unicode, that would be easiest.  

also, xml:page does not look like correct useage.  There isn't (to my knowledge) any such thing as an xml:page attribute.  "xml" is a reserved key, and you should avoid using it in your own semantic context.  Also, to reliably select nodes with a qualified name (such as foo:bar), you need to specify the selectionNamespaces property in your DomDocument object.  Same goes for using XPath.  selectSingleNode() defaults to the old XSL patterns language for backwards compatibility reasons.

Here's how to set them both (in JScript...sorry):

var xmldoc = new ActiveXObject("Msxml2.DOMDocument");
xmldoc.setProperty("SelectionLanguage", "XPath");
xmldoc.setProperty("SelectionNamespaces", "xmlns:foo='http://myserver.com' xmlns:bar='http://yourserver.com'");

This allows you to select a node using a qualified name, even if the actual prefix is different than the one in your SelectionNamepaces property.  For example, this XML:

<snafu:rootelement xmlns:snafu="http://tempuri.org">nice root element</snafu:rootelement>

can be selected by:

var xmldoc = new ActiveXObject("Msxml2.DOMDocument");
xmldoc.setProperty("SelectionLanguage", "XPath");
xmldoc.setProperty("SelectionNamespaces", "xmlns:foo='http://tempuri.org'");
var oNode = xmldoc.selectSingleNode("foo:rootelement")
alert(oNode.xml)

even though the prefix in the XML is "snafu" and the prefix in the select is "foo".  It's only the namespace that counts.

Regards,
Mike Sharp


 

0
 

Author Comment

by:vpikula
ID: 9849027
Thanks a lot for that detailed info, rdcpro! I'll repair the xml with the syntax pointers you gave once this problem is solved.

Your suggestion is: " If you can convert the submitted Shift-JIS to unicode, that would be easiest."

How do I do this? Right now the pages I present to my users to edit the XML on are encoded in Shift-JIS.  I could easily set these to be UTF-8 (the doc is in UTF-8) so there is no problem. But at the output end, I *have* to display the same text in Shift-JIS; the devices accessing the site are mobile phones that can only accept this.

So in short, I see two solutions:
1) I use a seperate XML doc, encoded in Shift-JIS
2) I let admins post in UTF-8, but convert the output to Shift-JIS

If you have a good solution for 2), I'll do that. Otherwise, I'll go for 1) and make a seperate document for my Japanese texts.

Thanks,
Victor
0
 
LVL 26

Accepted Solution

by:
rdcpro earned 500 total points
ID: 9850400
How do you serve the content for your site to the users.  By any chance, do you render the XML using XSLT?

You might have to use approach 1, but XSLT does have a nice method for producing different output encodings regardless of what the XML is encoded in.  The tag:

<xsl:output method="xml" encoding="shift-jis"/>

causes all output to be encoded in the desired encoding.  MSXML supports any encoding supported by Internet Explorer.  However, you can't dynamically specify the encoding at runtime (at least not elegantly).  You'd need at least a separate root XSLT for each encoding, and then use the appropriate one at runtime.  Each XSLT would import or include all it's templates, you you wouldn't really have any redundant code.  Something like:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" version="1.0" encoding="shift-jis" indent="yes"/>
      <xsl:include href="myTemplates.xslt"/>
</xsl:stylesheet>

On the other hand, if you're not using XSLT, and don't want to, you can probably use the stream object.  I believe you can set various encodings on it.  I don't have a code sample, though, as I usually end up using XSLT.  

It looks like to me that your XML file holds the content for all pages on the site, for all supported languages.  This sounds like a pretty big file, and parsing the entire thing isn't the best use of resources, I should think, considering a single site visitor will only use one locale.  There are a variety of approaches for localization...you might think about using a different approach.  For example, localizable resources go in a separate XML file for each locale, stored in a separate folder:

resourceRoot
    |_    en_US
    |_    fr_CA
    |
etc.

When you discover the site visitors locale or culture code, you modify the path to the XML resource, and cache the content in the user's session, or load it each time, depending on your needs.

.NET has a better way of dealing with localizable resources, too.

Regards,
Mike Sharp
0
 

Author Comment

by:vpikula
ID: 9851121
Thanks very much Mike -- that seperate folder solution (or different filenames) will work perfectly for me. No; I do not use XSLT (since I don't really understand it heh) so seperate files works best.

Great job-500 points coming your way!
Victor
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now