Link to home
Start Free TrialLog in
Avatar of qaziasim
qaziasim

asked on

i m getting unicode characters from BD table on select querry. PLZ HELP

i m having a problem plz help

Problem: I pasted some japanase characters in a Html text field and submited
form to an ASP file and using ADO i inserted the value of text field in a
MsAccess database table (field:Name, Type:text) in result some these type of unicode characters
were stored in DB "Œv‰æ‚̃}ƒl[ƒWƒƒ[ " now when i query the DB and make a
xml packet, then using transformNodeToObject i transfrom it and make HTML,
when i view these characters on browser, these characters are thrown as is on
the browser,

so what to do to show the actual language (japanese) characters

PLz Help
Qazi Asim
Avatar of MMeijer
MMeijer


make sure the encoding attribute of the xml pi (<?xml encoding="UTF-8" ?>)is set to "UTF-8", u also need to tel the browser to user the "UTF-8" charactermap with the meta tag "<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />" and the Response.Charactermap = "UTF-8" should also be set.
As i see you are using xsl (transformNodeToObject) you also need to set the encoding attribute of the xsl:output to "UTF-8".

I haven't tested it, but www.unicode.org sais that UTF-8 supports CJK

as the remark sais in the xml sdk doc, you can only use the output with Response.BinaryWrite  or Response.Write calls.
"I haven't tested it, but www.unicode.org sais that UTF-8 supports CJK"

Unicode is a 31-bit encoding for the world's characters. Ignoring Old Ethiopian and Klingon, 16-bits suffice to encode all prqactical langauges. UTF-8 is just an encoding of this 16-bit standard and thus handles all characters.

Did the Japanese from the input textarea box get into the Access database correctly? IE: if you view the Access database with Access, do they display properly? Because if they are not correct in the database it is going to be very difficult to get them correct in the browser!
Avatar of qaziasim

ASKER

let me agian explain u the problem,
"&#23478;" this is a japanese characters used for "house" when i sumbit this and store in database, its real utf-8 code "家
" is stored in databse correctly, if i simply response it one the browser.. with correct meta tag i-e UTF-8 , i can see the japanese character fine , but if i make the XML packet for example "

<?xml version="1.0" encoding="utf-8" ?>
<data><label>家</label></data>"

and save it as .xml and view it in browser , it is fine i can see japanese characters , but if i pass this XMLpacket as a string to xml parser and transform it and generate HTML ... it shows me these "家" characters in browser and if i view source it, i can see something like that "家"
what could b the problem

So:-

Correct in database
Correct in XML file
Incorrect after transformation.

Please post XSL file. In particular the <xsl:output> element.

Where do you do the transformation? Sever or in HTML page at browser?
this is teh XSL File
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" version="4.0" encoding="utf-8"/>
<xsl:template match="data">
     <table border="0" width="100%" cellspacing="1">
     <form action="utfins.asp?act=1" method="post" ID="Form1" accept-charset="UTF-8">
          <tr>
               <td  align="center" class="contents">
                    Select :
                    <select name="Name" ID="Name" accept-charset="UTF-8">
                         <xsl:for-each select="get-resellers/get-resellers-recordrow">
                         <option>
                              <xsl:attribute name="VALUE"><xsl:value-of select="get-resellers-name"/></xsl:attribute>
                              <xsl:value-of select="get-resellers-name"/>
                         </option>
                         </xsl:for-each>
                    </select>
                    &#32;<input type="submit" name="go" value="Set Options" class="butn" />
               </td>
          </tr>
          <tr>
               <td  align="center" class="contents">
                    Test Display  :
                         <xsl:for-each select="get-resellers/get-resellers-recordrow">
                              <xsl:value-of select="get-resellers-name"/><br/>
                         </xsl:for-each>
                    &#32;<input type="submit" name="go" value="Set Options" class="butn" />
               </td>
          </tr>
     </form>    
     </table>
</xsl:template>
</xsl:stylesheet>

i do the transformation on server Using MsXML transformNodeToObject

this is my transformation code


function TransformDocumentSimple(srcXML, srcXSL)
  Dim sourceFile, styleFile, source

  styleFile = srcXSL
 
  set  source = Server.CreateObject("MSXML2.DOMDocument")
  source.async = false
  source.loadXML srcXML
 
  set  style = Server.CreateObject("MSXML2.DOMDocument")
  style.async = false  
  style.load styleFile

  'Error Handaling    
  if (source.parseError.errorCode <> 0) then
    'result = reportParseError(source.parseError)
    set oerr = source.parseError
    sErrMsg = "XML Parsing Error. File: " & oErr.url & "  Reason : " & oErr.reason & " Line: " & oErr.line & ", Character: " & oErr.linepos & ", Text: " & oErr.srcText
    Response.Write sErrMsg
  elseif (style.parseError.errorCode <> 0) then
    'result = reportParseError(style.parseError)
    set oerr = style.parseError
    sErrMsg = "XML Parsing Error. File: " & oErr.url & "  Reason : " & oErr.reason & " Line: " & oErr.line & ", Character: " & oErr.linepos & ", Text: " & oErr.srcText
    Response.Write sErrMsg
  else
    'on error resume next
     'result = source.transformNode(style)
     source.transformNodeToObject style,Response
    if (err.number<>0) then
      result = reportRuntimeError(exception)
    end if
  end if
  'Make the result
  'result = Replace(result,"<META http-equiv='Content-Type' content='text/html'>","")
  'TransformDocumentSimple = result
End Function

<data>
     <get-resellers><recordcount>55</recordcount><get-resellers-recordrow><get-resellers-name>家</get-resellers-name></get-resellers-recordrow>
     <get-resellers-recordrow><get-resellers-name>asim</get-resellers-name></get-resellers-recordrow>
     <get-resellers-recordrow><get-resellers-name>家</get-resellers-name></get-resellers-recordrow>
</data>

passing this packet as a string
The meta-tag is incorrect, if you set the "method" attribute in the xsl:output element to "html", it will add the meta-tag without the ";charset=UTF-8" , try this:

<xsl:output
     method="xml"
     omit-xml-declaration="no"
     indent="yes"
     doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
     doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
     encoding="UTF-8"
     />

(change the doctype if your html is not conform xhtml-strict)
sorry to paste in valid packet . here is the correct one
<data>
    <get-resellers><recordcount>55</recordcount><get-resellers-recordrow><get-resellers-name>e.6</get-resellers-name></get-resellers-recordrow>
    <get-resellers-recordrow><get-resellers-name>asim</get-resellers-name></get-resellers-recordrow>
    <get-resellers-recordrow><get-resellers-name>e.6</get-resellers-name></get-resellers-recordrow>
</get-resellers>
</data>
ASKER CERTIFIED SOLUTION
Avatar of BigRat
BigRat
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
my script gives me error on
 Response.CodePage = 65001
error :
Microsoft VBScript runtime error '800a01b6'

Object doesn't support this property or method: 'Response.CodePage'

/test/codepage.ASP, line 3
it's session.codepage
"it's session.codepage"

Internet Information 6.0 has the variable moved from Session. Which version are you using, 5.0?
In the PSDK it sais that Session.CodePage is still available on iis6 but it changed a bit, and response.codepage has been added.
This because the codepage directive had the power to set the codepage for the entire session.

win2003 (iis6.0)
---------------------------
Session.CodePage will set the codepage for the Session (if session is enabled)
Response.Codepage and the Codepage directive (@codepage) apply for the page where set.

win2K/winXP (iis5.0,iis5.1)
---------------------------
Session.Codepage and the codepage directive will set the codepage for the Session.

As the error that qaziasim generated sais, he's behind a iis5.1 or lower.

There's alot of information about the "codepage" in the PSDK:
------------------------------------------------------
CodePage
The CodePage property specifies how strings are encoded in the intrinsic objects. A code page is a character set that can include numbers, punctuation marks, and other glyphs. Codepages are not the same for each language. Some languages such as Japanese and Hindi have multi-byte characters, while others like English and German only need one byte to represent each character. The CodePage property is read/write.

Syntax

Session.CodePage(=CodepageID)
 
Parameters

CodepageID
An integer that represents the character formatting codepage. You can find codepage integers at MSDN Library under the column for FamilyCodePage.
Remarks

Setting Session.CodePage explicitly affects all responses in a session. Session.CodePage sets Response.CodePage implicitly on each page. Sessions must be enabled to use Session.CodePage.

If Session.CodePage is not explicitly set in a page, it will be implicitly set by the AspCodePage metabase property. If the AspCodePage property is not set, or set to 0, Session.CodePage is set by the system ANSI codepage. Session.CodePage is no longer implicitly set by @CodePage as it was for IIS 5.0 and earlier versions. This change was made because one @CodePage had the power to change the codepage for an entire session. Now, @CodePage and Response.CodePage affect only single responses, and Session.CodePage affects all responses in a session.

There can be only one codepage per response body, otherwise incorrect characters are displayed. If you set the codepage explicitly in two pages, where one is called by the other with #include, Server.Execute, or Server.Transfer, usually the parent page decides the codepage. The only exception is if Response.CodePage is explicitly set in the parent page of a Server.Execute call. In that case, an @CodePage command in the child page overrides the parent codepage.

Literal strings in a script are still encoded using @CodePage (if present) or the AspCodePage metabase value (if set), or the system ANSI codepage. If you set Response.CodePage or Session.CodePage explicitly, do so before sending non-literal strings to the client. If you use literal and non-literal strings in the same page, make sure the codepage of @CodePage matches the codepage of Session.CodePage, or the literal strings are encoded differently from the non-literal strings and displayed incorrectly.

If the codepage of your Web page matches the system defaults of the Web client, you do not need to set a codepage in your Web page. However, setting the value is recommended.

If the codepage is set in a page, then Response.Charset should also be set. The codepage value tells IIS how to encode the data when building the response, and the charset value tells the browser how to decode the data when displaying the response. The CharsetName of Response.Charset must match the codepage value, or mixed characters will be displayed in the browser. Lists of CharsetNames and matching codepage values can be found at MSDN Library under the columns for Preferred Charset Label and FamilyCodePage.

The file format of a Web page must be the same as the @CodePage used in the page. Notepad.exe allows you to save files in UTF-8 format or in the system ANSI format. For example, if @CodePage is set to 65001 for UTF-8, the Web file must be saved in UTF-8 format. If @CodePage is set to 1252, the Web file must be saved in ANSI format on an English or German system. If you want to save a page in the ANSI format for a language other than your system language, you can change your default System Locale in Regional Options from the Control Panel. For example, once you change your system locale to Japanese, any files you save in ANSI format will be saved using the Japanese codepage. They will only be readable from a Japanese System Locale.

If you are writing and testing Web pages that use different codepages and character sets (for example, creating a multi-lingual Web site), remember that your test client-computer must have the language packs installed for each language you wish to display. You can install language packs from Regional Options in the Control Panel.
Thankx for clearing my codepage and charset concepts,
i have installed all the required language packs. I cant see any changes after setting charset and code page .

As i told u that i m using TransformNodeToObject. when i query DB and make the XML packet as i sent u earlier, if i send this packet as a String and load  using "LOADXML" method of XML Parser, the encoded characters are thrown to the client which is the real problem. but if i save this packet as a .xml file and then load it using "LOAD" method of XML parser .... every thing i shown fine, what could b the problem,
can u plz look into it

Qazi Asim
microsofts remark on loadXML:

"The loadXML() method will work only with UTF-16 or UCS-2 encodings."

you might wanna try UTF-16 (doh..), this is the only differnce between load and loadXML in this matter.
this means that i ll have to cenvert my UTF-8 encoded data present in the database to UTF-16 then i ll have to send it to LOADXML()......

1) is there any function to convert from UTF-8 tp UTF-16
2) will it will do the required work for me

Qazi Asim
i've tested the 2 methods "Response.Write objXmlDom.transformNode(objXslDom)" and "objXmlDom.transformNodeToObject objXslDom, Response"

For the first on u need to set the SessionCodepage to 65001 for the other no serverside adjustment are needed.

What is importent is to set the meta tag to the correct charmap, you can look this up in the IE toolbar under "View" --> "Encoding", if this is autoselect the meta tag is invalid, this should be utf-8.

My test page:
--------------------------------

<%
Option Explicit

Const STR_XML_DATA = "<data><get-resellers><recordcount>55</recordcount><get-resellers-recordrow><get-resellers-name>&#23478;</get-resellers-name></get-resellers-recordrow><get-resellers-recordrow><get-resellers-name>asim</get-resellers-name></get-resellers-recordrow><get-resellers-recordrow><get-resellers-name>&#23478;</get-resellers-name></get-resellers-recordrow></get-resellers></data>"
Const STR_XSL_DATA = "<xsl:stylesheet xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"" version=""1.0""><xsl:output method=""xml"" omit-xml-declaration=""no"" indent=""yes"" doctype-system=""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"" doctype-public=""-//W3C//DTD XHTML 1.0 Strict//EN"" encoding=""UTF-8"" /><xsl:template match=""/""><html><head><meta http-equiv=""content-type"" content=""text/html; charset=utf-8"" /><title>Test</title></head><body><xsl:for-each select=""//get-resellers-name""><xsl:value-of select=""."" /><br /></xsl:for-each></body></html></xsl:template></xsl:stylesheet>"

Const STR_XML_DOM_PROGID = "msxml2.domdocument.4.0"

Sub Main
     Dim objXmlDom, objXslDom, objXmlRet

     Set objXmlDom = Server.CreateObject(STR_XML_DOM_PROGID)
     Set objXslDom = Server.CreateObject(STR_XML_DOM_PROGID)
     Set objXmlRet = Server.CreateObject(STR_XML_DOM_PROGID)
     objXmlret.async=true
     objXmlRet.validateOnParse=true

     If Not objXmlDom.loadXML(STR_XML_DATA) Then
          Response.Write objXmlDom.parseError.reason
          Set objXmlDom = Nothing
          Exit Sub
     End If

     If Not objXslDom.loadXML(STR_XSL_DATA) Then
          Response.Write objXslDom.parseError.reason
          Set objXmlDom = Nothing
          Set objXslDom = Nothing
          Exit Sub
     End If
     
     'Response.Charset = "UTF-8"
     'Session.CodePage = 65001
     'Session.Abandon

     'this works regardless of Response.charset and Session.Codepage
     objXmlDom.transformNodeToObject objXslDom, Response

     'this only works if Session.Codepage is set to 65001
     'Response.Write objXmlDom.transformNode(objXslDom)

     'remark:
     'For both methods the meta tag should be set else the browser will use autoselect for charactermap, windows-5212.
     'You can lookup it up in IE in toolbar select "view" --> "encoding".

     Set objXmlDom = Nothing
     Set objXslDom = Nothing
     Set objXmlret = Nothing
End sub

Call Main
%>

----------------------------------
microsofts remark on loadXML:

"The loadXML() method will work only with UTF-16 or UCS-2 encodings."


loadXML takes a BSTR or a string of 16-bit Unicode characters and loads. How a stream of 8-bit characters is converted into 16-bit characters depends on various settings.

Irrespective of whatever settings IE or MS has the STANDARD says that the default char set for an HTML data stream is ISO-8898-1. The character set in HTML itself, ie: what can be displayed, is Unicode (from HTML 4.0) Many browsers however (including IE) interpret this somewhat differently and say that the default is the locale.

If however the http response contains the following :-

Content-Type: text/html; charset=xxxx

then the stream is interpreted in that character set. Not all character sets contains all Unicode characters except UCS-2 and UTF-8. UCS-2 is 16-bits and wasteful, so one uses utf-8. Setting the Response.charset ensures that the charset attribute turns up in the Content-Type.

Now onto codepage. In ASP/VB/and almost everywhere else 16-bit Unicode characters are used (WinNT 4.0/Win2K and all of COM). How does one convert 16-bit characters to 8-bit? Via the codepage in the locale. Again not all codepages contains all Unicode characters except UTF-8. So setting the session (and in version 6 response) codepage property to 65001 the 16-bit characters get converted into 8-bit UTF-8 characters which are sent to the browser and the browser is told that they are UTF-8 characters and that makes everything go OK.

Remember in ASP script all string data uses 16-bit chars. It is onyl when one starts writing these to streams (http responses, files and so on) that a 16 to 8-bit conversion must take place.

HTH
hi
 thankx for such a help ful details ,
but after setting codepage and characterset problem is stll there, As LoadXML method can read only UTF-16 and UCS-2 characters, where as my input to this method is mixed UTF-8 + UTF-16 encoded characters, now i m really stucked what to do.

can u give me the solution in code form .... i mean can u make required changes to my code if i send u ?
one test ASP file, Access DataBase with one table, and one XSL Document ?

I will be really thankfull to u if u pull me out of this problem

Qazi Asim
can u send a link to the page, or the html output?
Thaknx A lot bigrat and MM for ur great Help, problem is solved, By the Grace Of GOD and ur help.

Thankx again

Qazi Asim