?
Solved

i m getting unicode characters from BD table on select querry. PLZ HELP

Posted on 2003-03-11
21
Medium Priority
?
378 Views
Last Modified: 2013-12-03
i m having a problem plz help

Problem: I pasted some japanase characters in a Html text field and submited
form to an ASP file and using ADO i inserted the value of text field in a
MsAccess database table (field:Name, Type:text) in result some these type of unicode characters
were stored in DB "Œv‰æ‚̃}ƒl[ƒWƒƒ[ " now when i query the DB and make a
xml packet, then using transformNodeToObject i transfrom it and make HTML,
when i view these characters on browser, these characters are thrown as is on
the browser,

so what to do to show the actual language (japanese) characters

PLz Help
Qazi Asim
0
Comment
Question by:qaziasim
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 7
  • 5
21 Comments
 
LVL 5

Expert Comment

by:MMeijer
ID: 8114499

make sure the encoding attribute of the xml pi (<?xml encoding="UTF-8" ?>)is set to "UTF-8", u also need to tel the browser to user the "UTF-8" charactermap with the meta tag "<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />" and the Response.Charactermap = "UTF-8" should also be set.
As i see you are using xsl (transformNodeToObject) you also need to set the encoding attribute of the xsl:output to "UTF-8".

I haven't tested it, but www.unicode.org sais that UTF-8 supports CJK

as the remark sais in the xml sdk doc, you can only use the output with Response.BinaryWrite  or Response.Write calls.
0
 
LVL 27

Expert Comment

by:BigRat
ID: 8118747
"I haven't tested it, but www.unicode.org sais that UTF-8 supports CJK"

Unicode is a 31-bit encoding for the world's characters. Ignoring Old Ethiopian and Klingon, 16-bits suffice to encode all prqactical langauges. UTF-8 is just an encoding of this 16-bit standard and thus handles all characters.

Did the Japanese from the input textarea box get into the Access database correctly? IE: if you view the Access database with Access, do they display properly? Because if they are not correct in the database it is going to be very difficult to get them correct in the browser!
0
 

Author Comment

by:qaziasim
ID: 8118855
let me agian explain u the problem,
"&#23478;" this is a japanese characters used for "house" when i sumbit this and store in database, its real utf-8 code "家
" is stored in databse correctly, if i simply response it one the browser.. with correct meta tag i-e UTF-8 , i can see the japanese character fine , but if i make the XML packet for example "

<?xml version="1.0" encoding="utf-8" ?>
<data><label>家</label></data>"

and save it as .xml and view it in browser , it is fine i can see japanese characters , but if i pass this XMLpacket as a string to xml parser and transform it and generate HTML ... it shows me these "家" characters in browser and if i view source it, i can see something like that "家"
what could b the problem

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 27

Expert Comment

by:BigRat
ID: 8118971
So:-

Correct in database
Correct in XML file
Incorrect after transformation.

Please post XSL file. In particular the <xsl:output> element.

Where do you do the transformation? Sever or in HTML page at browser?
0
 

Author Comment

by:qaziasim
ID: 8119106
this is teh XSL File
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" version="4.0" encoding="utf-8"/>
<xsl:template match="data">
     <table border="0" width="100%" cellspacing="1">
     <form action="utfins.asp?act=1" method="post" ID="Form1" accept-charset="UTF-8">
          <tr>
               <td  align="center" class="contents">
                    Select :
                    <select name="Name" ID="Name" accept-charset="UTF-8">
                         <xsl:for-each select="get-resellers/get-resellers-recordrow">
                         <option>
                              <xsl:attribute name="VALUE"><xsl:value-of select="get-resellers-name"/></xsl:attribute>
                              <xsl:value-of select="get-resellers-name"/>
                         </option>
                         </xsl:for-each>
                    </select>
                    &#32;<input type="submit" name="go" value="Set Options" class="butn" />
               </td>
          </tr>
          <tr>
               <td  align="center" class="contents">
                    Test Display  :
                         <xsl:for-each select="get-resellers/get-resellers-recordrow">
                              <xsl:value-of select="get-resellers-name"/><br/>
                         </xsl:for-each>
                    &#32;<input type="submit" name="go" value="Set Options" class="butn" />
               </td>
          </tr>
     </form>    
     </table>
</xsl:template>
</xsl:stylesheet>

i do the transformation on server Using MsXML transformNodeToObject

this is my transformation code


function TransformDocumentSimple(srcXML, srcXSL)
  Dim sourceFile, styleFile, source

  styleFile = srcXSL
 
  set  source = Server.CreateObject("MSXML2.DOMDocument")
  source.async = false
  source.loadXML srcXML
 
  set  style = Server.CreateObject("MSXML2.DOMDocument")
  style.async = false  
  style.load styleFile

  'Error Handaling    
  if (source.parseError.errorCode <> 0) then
    'result = reportParseError(source.parseError)
    set oerr = source.parseError
    sErrMsg = "XML Parsing Error. File: " & oErr.url & "  Reason : " & oErr.reason & " Line: " & oErr.line & ", Character: " & oErr.linepos & ", Text: " & oErr.srcText
    Response.Write sErrMsg
  elseif (style.parseError.errorCode <> 0) then
    'result = reportParseError(style.parseError)
    set oerr = style.parseError
    sErrMsg = "XML Parsing Error. File: " & oErr.url & "  Reason : " & oErr.reason & " Line: " & oErr.line & ", Character: " & oErr.linepos & ", Text: " & oErr.srcText
    Response.Write sErrMsg
  else
    'on error resume next
     'result = source.transformNode(style)
     source.transformNodeToObject style,Response
    if (err.number<>0) then
      result = reportRuntimeError(exception)
    end if
  end if
  'Make the result
  'result = Replace(result,"<META http-equiv='Content-Type' content='text/html'>","")
  'TransformDocumentSimple = result
End Function

0
 

Author Comment

by:qaziasim
ID: 8119130
<data>
     <get-resellers><recordcount>55</recordcount><get-resellers-recordrow><get-resellers-name>家</get-resellers-name></get-resellers-recordrow>
     <get-resellers-recordrow><get-resellers-name>asim</get-resellers-name></get-resellers-recordrow>
     <get-resellers-recordrow><get-resellers-name>家</get-resellers-name></get-resellers-recordrow>
</data>

passing this packet as a string
0
 
LVL 5

Expert Comment

by:MMeijer
ID: 8119192
The meta-tag is incorrect, if you set the "method" attribute in the xsl:output element to "html", it will add the meta-tag without the ";charset=UTF-8" , try this:

<xsl:output
     method="xml"
     omit-xml-declaration="no"
     indent="yes"
     doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
     doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
     encoding="UTF-8"
     />

(change the doctype if your html is not conform xhtml-strict)
0
 

Author Comment

by:qaziasim
ID: 8119204
sorry to paste in valid packet . here is the correct one
<data>
    <get-resellers><recordcount>55</recordcount><get-resellers-recordrow><get-resellers-name>e.6</get-resellers-name></get-resellers-recordrow>
    <get-resellers-recordrow><get-resellers-name>asim</get-resellers-name></get-resellers-recordrow>
    <get-resellers-recordrow><get-resellers-name>e.6</get-resellers-name></get-resellers-recordrow>
</get-resellers>
</data>
0
 
LVL 27

Accepted Solution

by:
BigRat earned 160 total points
ID: 8119363
You are writing the generated HTML directly into the response object whose default properties have not been correspondingly changed :-

 source.transformNodeToObject style,Response

as I said before, the default CodePage is the session code page and this default is ANSI (which is near enough to iso-8859-1). The default CharSet is iso-8859-1. You must change BOTH of these BEFORE you write anything into  Response!

Response.CodePage = 65001
Response.CharSet = "utf-8"

HTH
0
 

Author Comment

by:qaziasim
ID: 8120133
my script gives me error on
 Response.CodePage = 65001
error :
Microsoft VBScript runtime error '800a01b6'

Object doesn't support this property or method: 'Response.CodePage'

/test/codepage.ASP, line 3
0
 
LVL 5

Expert Comment

by:MMeijer
ID: 8120156
it's session.codepage
0
 
LVL 27

Expert Comment

by:BigRat
ID: 8120997
"it's session.codepage"

Internet Information 6.0 has the variable moved from Session. Which version are you using, 5.0?
0
 
LVL 5

Expert Comment

by:MMeijer
ID: 8121370
In the PSDK it sais that Session.CodePage is still available on iis6 but it changed a bit, and response.codepage has been added.
This because the codepage directive had the power to set the codepage for the entire session.

win2003 (iis6.0)
---------------------------
Session.CodePage will set the codepage for the Session (if session is enabled)
Response.Codepage and the Codepage directive (@codepage) apply for the page where set.

win2K/winXP (iis5.0,iis5.1)
---------------------------
Session.Codepage and the codepage directive will set the codepage for the Session.

As the error that qaziasim generated sais, he's behind a iis5.1 or lower.

There's alot of information about the "codepage" in the PSDK:
------------------------------------------------------
CodePage
The CodePage property specifies how strings are encoded in the intrinsic objects. A code page is a character set that can include numbers, punctuation marks, and other glyphs. Codepages are not the same for each language. Some languages such as Japanese and Hindi have multi-byte characters, while others like English and German only need one byte to represent each character. The CodePage property is read/write.

Syntax

Session.CodePage(=CodepageID)
 
Parameters

CodepageID
An integer that represents the character formatting codepage. You can find codepage integers at MSDN Library under the column for FamilyCodePage.
Remarks

Setting Session.CodePage explicitly affects all responses in a session. Session.CodePage sets Response.CodePage implicitly on each page. Sessions must be enabled to use Session.CodePage.

If Session.CodePage is not explicitly set in a page, it will be implicitly set by the AspCodePage metabase property. If the AspCodePage property is not set, or set to 0, Session.CodePage is set by the system ANSI codepage. Session.CodePage is no longer implicitly set by @CodePage as it was for IIS 5.0 and earlier versions. This change was made because one @CodePage had the power to change the codepage for an entire session. Now, @CodePage and Response.CodePage affect only single responses, and Session.CodePage affects all responses in a session.

There can be only one codepage per response body, otherwise incorrect characters are displayed. If you set the codepage explicitly in two pages, where one is called by the other with #include, Server.Execute, or Server.Transfer, usually the parent page decides the codepage. The only exception is if Response.CodePage is explicitly set in the parent page of a Server.Execute call. In that case, an @CodePage command in the child page overrides the parent codepage.

Literal strings in a script are still encoded using @CodePage (if present) or the AspCodePage metabase value (if set), or the system ANSI codepage. If you set Response.CodePage or Session.CodePage explicitly, do so before sending non-literal strings to the client. If you use literal and non-literal strings in the same page, make sure the codepage of @CodePage matches the codepage of Session.CodePage, or the literal strings are encoded differently from the non-literal strings and displayed incorrectly.

If the codepage of your Web page matches the system defaults of the Web client, you do not need to set a codepage in your Web page. However, setting the value is recommended.

If the codepage is set in a page, then Response.Charset should also be set. The codepage value tells IIS how to encode the data when building the response, and the charset value tells the browser how to decode the data when displaying the response. The CharsetName of Response.Charset must match the codepage value, or mixed characters will be displayed in the browser. Lists of CharsetNames and matching codepage values can be found at MSDN Library under the columns for Preferred Charset Label and FamilyCodePage.

The file format of a Web page must be the same as the @CodePage used in the page. Notepad.exe allows you to save files in UTF-8 format or in the system ANSI format. For example, if @CodePage is set to 65001 for UTF-8, the Web file must be saved in UTF-8 format. If @CodePage is set to 1252, the Web file must be saved in ANSI format on an English or German system. If you want to save a page in the ANSI format for a language other than your system language, you can change your default System Locale in Regional Options from the Control Panel. For example, once you change your system locale to Japanese, any files you save in ANSI format will be saved using the Japanese codepage. They will only be readable from a Japanese System Locale.

If you are writing and testing Web pages that use different codepages and character sets (for example, creating a multi-lingual Web site), remember that your test client-computer must have the language packs installed for each language you wish to display. You can install language packs from Regional Options in the Control Panel.
0
 

Author Comment

by:qaziasim
ID: 8125601
Thankx for clearing my codepage and charset concepts,
i have installed all the required language packs. I cant see any changes after setting charset and code page .

As i told u that i m using TransformNodeToObject. when i query DB and make the XML packet as i sent u earlier, if i send this packet as a String and load  using "LOADXML" method of XML Parser, the encoded characters are thrown to the client which is the real problem. but if i save this packet as a .xml file and then load it using "LOAD" method of XML parser .... every thing i shown fine, what could b the problem,
can u plz look into it

Qazi Asim
0
 
LVL 5

Expert Comment

by:MMeijer
ID: 8126281
microsofts remark on loadXML:

"The loadXML() method will work only with UTF-16 or UCS-2 encodings."

you might wanna try UTF-16 (doh..), this is the only differnce between load and loadXML in this matter.
0
 

Author Comment

by:qaziasim
ID: 8126601
this means that i ll have to cenvert my UTF-8 encoded data present in the database to UTF-16 then i ll have to send it to LOADXML()......

1) is there any function to convert from UTF-8 tp UTF-16
2) will it will do the required work for me

Qazi Asim
0
 
LVL 5

Expert Comment

by:MMeijer
ID: 8126823
i've tested the 2 methods "Response.Write objXmlDom.transformNode(objXslDom)" and "objXmlDom.transformNodeToObject objXslDom, Response"

For the first on u need to set the SessionCodepage to 65001 for the other no serverside adjustment are needed.

What is importent is to set the meta tag to the correct charmap, you can look this up in the IE toolbar under "View" --> "Encoding", if this is autoselect the meta tag is invalid, this should be utf-8.

My test page:
--------------------------------

<%
Option Explicit

Const STR_XML_DATA = "<data><get-resellers><recordcount>55</recordcount><get-resellers-recordrow><get-resellers-name>&#23478;</get-resellers-name></get-resellers-recordrow><get-resellers-recordrow><get-resellers-name>asim</get-resellers-name></get-resellers-recordrow><get-resellers-recordrow><get-resellers-name>&#23478;</get-resellers-name></get-resellers-recordrow></get-resellers></data>"
Const STR_XSL_DATA = "<xsl:stylesheet xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"" version=""1.0""><xsl:output method=""xml"" omit-xml-declaration=""no"" indent=""yes"" doctype-system=""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"" doctype-public=""-//W3C//DTD XHTML 1.0 Strict//EN"" encoding=""UTF-8"" /><xsl:template match=""/""><html><head><meta http-equiv=""content-type"" content=""text/html; charset=utf-8"" /><title>Test</title></head><body><xsl:for-each select=""//get-resellers-name""><xsl:value-of select=""."" /><br /></xsl:for-each></body></html></xsl:template></xsl:stylesheet>"

Const STR_XML_DOM_PROGID = "msxml2.domdocument.4.0"

Sub Main
     Dim objXmlDom, objXslDom, objXmlRet

     Set objXmlDom = Server.CreateObject(STR_XML_DOM_PROGID)
     Set objXslDom = Server.CreateObject(STR_XML_DOM_PROGID)
     Set objXmlRet = Server.CreateObject(STR_XML_DOM_PROGID)
     objXmlret.async=true
     objXmlRet.validateOnParse=true

     If Not objXmlDom.loadXML(STR_XML_DATA) Then
          Response.Write objXmlDom.parseError.reason
          Set objXmlDom = Nothing
          Exit Sub
     End If

     If Not objXslDom.loadXML(STR_XSL_DATA) Then
          Response.Write objXslDom.parseError.reason
          Set objXmlDom = Nothing
          Set objXslDom = Nothing
          Exit Sub
     End If
     
     'Response.Charset = "UTF-8"
     'Session.CodePage = 65001
     'Session.Abandon

     'this works regardless of Response.charset and Session.Codepage
     objXmlDom.transformNodeToObject objXslDom, Response

     'this only works if Session.Codepage is set to 65001
     'Response.Write objXmlDom.transformNode(objXslDom)

     'remark:
     'For both methods the meta tag should be set else the browser will use autoselect for charactermap, windows-5212.
     'You can lookup it up in IE in toolbar select "view" --> "encoding".

     Set objXmlDom = Nothing
     Set objXslDom = Nothing
     Set objXmlret = Nothing
End sub

Call Main
%>

----------------------------------
0
 
LVL 27

Expert Comment

by:BigRat
ID: 8127120
microsofts remark on loadXML:

"The loadXML() method will work only with UTF-16 or UCS-2 encodings."


loadXML takes a BSTR or a string of 16-bit Unicode characters and loads. How a stream of 8-bit characters is converted into 16-bit characters depends on various settings.

Irrespective of whatever settings IE or MS has the STANDARD says that the default char set for an HTML data stream is ISO-8898-1. The character set in HTML itself, ie: what can be displayed, is Unicode (from HTML 4.0) Many browsers however (including IE) interpret this somewhat differently and say that the default is the locale.

If however the http response contains the following :-

Content-Type: text/html; charset=xxxx

then the stream is interpreted in that character set. Not all character sets contains all Unicode characters except UCS-2 and UTF-8. UCS-2 is 16-bits and wasteful, so one uses utf-8. Setting the Response.charset ensures that the charset attribute turns up in the Content-Type.

Now onto codepage. In ASP/VB/and almost everywhere else 16-bit Unicode characters are used (WinNT 4.0/Win2K and all of COM). How does one convert 16-bit characters to 8-bit? Via the codepage in the locale. Again not all codepages contains all Unicode characters except UTF-8. So setting the session (and in version 6 response) codepage property to 65001 the 16-bit characters get converted into 8-bit UTF-8 characters which are sent to the browser and the browser is told that they are UTF-8 characters and that makes everything go OK.

Remember in ASP script all string data uses 16-bit chars. It is onyl when one starts writing these to streams (http responses, files and so on) that a 16 to 8-bit conversion must take place.

HTH
0
 

Author Comment

by:qaziasim
ID: 8141608
hi
 thankx for such a help ful details ,
but after setting codepage and characterset problem is stll there, As LoadXML method can read only UTF-16 and UCS-2 characters, where as my input to this method is mixed UTF-8 + UTF-16 encoded characters, now i m really stucked what to do.

can u give me the solution in code form .... i mean can u make required changes to my code if i send u ?
one test ASP file, Access DataBase with one table, and one XSL Document ?

I will be really thankfull to u if u pull me out of this problem

Qazi Asim
0
 
LVL 5

Expert Comment

by:MMeijer
ID: 8142176
can u send a link to the page, or the html output?
0
 

Author Comment

by:qaziasim
ID: 8149492
Thaknx A lot bigrat and MM for ur great Help, problem is solved, By the Grace Of GOD and ur help.

Thankx again

Qazi Asim
0

Featured Post

Stack Overflow Podcast - Developer Story

Welcome to the Stack Overflow podcast recorded Thursday July 20 at Stack Overflow Headquearters in NYC. Your hosts today are podcast regulars Jay Hanlon, David Fullerton, and Ilana Yitzhaki, plus the quite irregular Matt Sherman (Stack Overflow Engineering Manager extraordinaire)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question