Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Japanese breaks up XML (msxml2)

Posted on 2003-11-30
4
Medium Priority
?
422 Views
Last Modified: 2013-11-19
Hi there,

I have the following problem: I wrote a simple ASP script (see function below) that allows administrators of my website to update text strings in an XML file. This works perfectly for any unicode language, but not for Japanese. The file becomes totally corrupted and unreadable; in fact, is "cut" at the exact point where it should be normally updated.

I suppose the problem lies either in the fact that I use an old msxml parser (ver. 2) or because the Japanese encoding is set to shift-jis i.s.o. UTF-8.

Does anyone here have any experience with this problem, and a possible solution?

Thanks,
Victor
 

'XML UPDATE NODE VALUE
Function fncUpdateXML(strLanguage, strScriptName, strNode, strText)
      Set xmlDoc = Server.CreateObject("msxml2.DOMDocument")
      xmlDoc.async = False
      If NOT xmlDoc.Load("c:\testfile.xml") Then
            Response.Write "Page failed to load"
            Response.End
      Else
            strText = Replace(strText, "<br>", vblf)
            xmlDoc.SelectSingleNode("/languages/language[@xml:lang='" & strLanguage & "']/pages/page[@xml:page='" & strScriptName & "']/" & strNode).text = strText
      xmlDoc.save "c:\testfile.xml"
      End If
      Set xmlDoc = Nothing
End Function
0
Comment
Question by:vpikula
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 26

Expert Comment

by:rdcpro
ID: 9848729
No, you're using MSXML 3.  The ProgID  "Msxml2.DomDocument" doesn't mean MSMXML version 2...it's more like version 2 of the API.  MSXML version 2 had a different API (this was before things were settled at the W3C).  MSXML 4 uses a similar ProgID: "Msxml2.DomDocument.4.0"

Any time you have strings involved, your encoding is actually UTF-16.  This is because BStr's are essentially UTF-16.  So somewhere along the road, your encoding is getting goofed up.  Now, if your document contains Unicode characters, and you were to try to insert shift-jis characters in it, you'd have a problem as the document can only have one encoding.  If you can convert the submitted Shift-JIS to unicode, that would be easiest.  

also, xml:page does not look like correct useage.  There isn't (to my knowledge) any such thing as an xml:page attribute.  "xml" is a reserved key, and you should avoid using it in your own semantic context.  Also, to reliably select nodes with a qualified name (such as foo:bar), you need to specify the selectionNamespaces property in your DomDocument object.  Same goes for using XPath.  selectSingleNode() defaults to the old XSL patterns language for backwards compatibility reasons.

Here's how to set them both (in JScript...sorry):

var xmldoc = new ActiveXObject("Msxml2.DOMDocument");
xmldoc.setProperty("SelectionLanguage", "XPath");
xmldoc.setProperty("SelectionNamespaces", "xmlns:foo='http://myserver.com' xmlns:bar='http://yourserver.com'");

This allows you to select a node using a qualified name, even if the actual prefix is different than the one in your SelectionNamepaces property.  For example, this XML:

<snafu:rootelement xmlns:snafu="http://tempuri.org">nice root element</snafu:rootelement>

can be selected by:

var xmldoc = new ActiveXObject("Msxml2.DOMDocument");
xmldoc.setProperty("SelectionLanguage", "XPath");
xmldoc.setProperty("SelectionNamespaces", "xmlns:foo='http://tempuri.org'");
var oNode = xmldoc.selectSingleNode("foo:rootelement")
alert(oNode.xml)

even though the prefix in the XML is "snafu" and the prefix in the select is "foo".  It's only the namespace that counts.

Regards,
Mike Sharp


 

0
 

Author Comment

by:vpikula
ID: 9849027
Thanks a lot for that detailed info, rdcpro! I'll repair the xml with the syntax pointers you gave once this problem is solved.

Your suggestion is: " If you can convert the submitted Shift-JIS to unicode, that would be easiest."

How do I do this? Right now the pages I present to my users to edit the XML on are encoded in Shift-JIS.  I could easily set these to be UTF-8 (the doc is in UTF-8) so there is no problem. But at the output end, I *have* to display the same text in Shift-JIS; the devices accessing the site are mobile phones that can only accept this.

So in short, I see two solutions:
1) I use a seperate XML doc, encoded in Shift-JIS
2) I let admins post in UTF-8, but convert the output to Shift-JIS

If you have a good solution for 2), I'll do that. Otherwise, I'll go for 1) and make a seperate document for my Japanese texts.

Thanks,
Victor
0
 
LVL 26

Accepted Solution

by:
rdcpro earned 2000 total points
ID: 9850400
How do you serve the content for your site to the users.  By any chance, do you render the XML using XSLT?

You might have to use approach 1, but XSLT does have a nice method for producing different output encodings regardless of what the XML is encoded in.  The tag:

<xsl:output method="xml" encoding="shift-jis"/>

causes all output to be encoded in the desired encoding.  MSXML supports any encoding supported by Internet Explorer.  However, you can't dynamically specify the encoding at runtime (at least not elegantly).  You'd need at least a separate root XSLT for each encoding, and then use the appropriate one at runtime.  Each XSLT would import or include all it's templates, you you wouldn't really have any redundant code.  Something like:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" version="1.0" encoding="shift-jis" indent="yes"/>
      <xsl:include href="myTemplates.xslt"/>
</xsl:stylesheet>

On the other hand, if you're not using XSLT, and don't want to, you can probably use the stream object.  I believe you can set various encodings on it.  I don't have a code sample, though, as I usually end up using XSLT.  

It looks like to me that your XML file holds the content for all pages on the site, for all supported languages.  This sounds like a pretty big file, and parsing the entire thing isn't the best use of resources, I should think, considering a single site visitor will only use one locale.  There are a variety of approaches for localization...you might think about using a different approach.  For example, localizable resources go in a separate XML file for each locale, stored in a separate folder:

resourceRoot
    |_    en_US
    |_    fr_CA
    |
etc.

When you discover the site visitors locale or culture code, you modify the path to the XML resource, and cache the content in the user's session, or load it each time, depending on your needs.

.NET has a better way of dealing with localizable resources, too.

Regards,
Mike Sharp
0
 

Author Comment

by:vpikula
ID: 9851121
Thanks very much Mike -- that seperate folder solution (or different filenames) will work perfectly for me. No; I do not use XSLT (since I don't really understand it heh) so seperate files works best.

Great job-500 points coming your way!
Victor
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question