Solved

problems displaying chinese chars. with utf-8 encoding

Posted on 2006-07-11
16
1,726 Views
Last Modified: 2008-01-09
Hi All,

From last two days, i m having nightmare with displaying chinese characters in a web page. I have done everything, seems required. The characters are displayed properly but the html page generated is incomplete.

Please note: The html generated by the servlet engine is incomplete. Seems a problem with the JspServlet..but have no idea.

The things i did

Option 1) Use the following directives and meta tags

<%@ page pageEncoding="UTF-8"%>
<%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
<html>
<head>
      <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
...etc (get chinese content from database)
</body>
</html>

Option 2) to simplify and avoid adding these lines in all jsps, i just use a Servlet filter to set the response character encoding..

response.setCharacterEncoding("UTF-8");
response.setContentType("text/html; charset=UTF-8");
response.setHeader("Content-Type","text/html; charset=UTF-8");

Apart from that, the tomcat web.xml is properly set for javaEncoding like
<init-param>
            <param-name>javaEncoding</param-name>
            <param-value>UTF8</param-value>
        </init-param>

On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8.

I am not talking about uri encoding or anything on the request params. Just wish to display the page properly. Any ideas??

regards,
aks


 
0
Comment
Question by:aks143
  • 7
  • 5
  • 2
  • +1
16 Comments
 
LVL 49

Expert Comment

by:Ryan Chong
ID: 17081617
this works for me to display UTF-8 Characters in Chinese in a previous application:

<%@ page language="java" pageEncoding="utf8" contentType="text/html;charset=utf-8" %>
...

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">



if will not work if your data itself is not properly formatted in UTF-8 format...
0
 

Author Comment

by:aks143
ID: 17081841
Hi ryancys,

Thanks for your reply. There is no difference in your solution and the one i posted. I don't see data problem, because it is displayed properly. The problem is the complete page is not rendered. I tried your solution also..no luck.
0
 
LVL 49

Expert Comment

by:Ryan Chong
ID: 17081986
>>but the html page generated is incomplete.
try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
2. your html is valid and there is no missing tags in your rendered html content.
0
 

Author Comment

by:aks143
ID: 17082071
try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
--> Checked. No errors reported to logs or elsewhere.

2. your html is valid and there is no missing tags in your rendered html content.
--> rendered html content is the problem, because the servlet engine is not generating the complete html page [Only in case the characters are chinese]. for english characters the page rendered fine and complete. Strange!!

I believe, the bytes used by the chinese are more per character and so the servlet engine is trouble displaying them.
0
 

Author Comment

by:aks143
ID: 17082107
even for jsp runtime compilation, i have added -Dfile.encoding=UTF-8 in the tomcat's catalina.bat
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17082306
>>The characters are displayed properly but the html page generated is incomplete.

     how do you write out those chinese characters into jsp? show me code you render them out.
0
 

Author Comment

by:aks143
ID: 17082492
for testing purposes, the following code. Otherwise, using struts taglibs in the view layer..

get the connection, and make a query...

ResultSet set = statement.getResultSet ();
while (set.next ())
{
      out.println(set.getString(1));
      out.println("-->");
      out.println(set.getString(2)); // here chinese chars.
}
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:aks143
ID: 17084584
actonwang. Do you see any problem with the code?

It is driving me crazy. Can someone tell me an answer for a short question:
I see that there are Chinese Simplified, Chinese Traditional character sets available, are these all covered with utf-8 ? How one can know which character are covered under which encodings?

thanks for any help.
aks
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17086163
>><%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
     these are good. I suspect that you didn't write characters out correctly. You HAVE TO write your chinese characters out using UTF8 encoding to enable them to be displayed properly using UTF-8 encoding in client side (or html file).
     that should be your problem.

Acton
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17086227
>>out.println(set.getString(2));

     are you using think JDBC driver?


>>On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8
     It looks like a problem If you use JDBC think driver:

The JDBC Thin driver can access databases that use any of the following character sets:

    * US7ASCII (ASCII)
    * WE8ISO8859P1 (ISO-latin-1)
    * AL24UTFFSS (Unicode 1.2)
    * UTF8 (Unicode 2.0)

This happens automatically with no special action on your part.

Databases that use other character sets are not supported yet. The JDBC Thin driver can only use the US7ASCII character set for them.

     That means, in you case, you get US7ASCII code in your thin JDBC code which is why you have the problem.
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17086230
consider to use OCI driver if possible:

refer to this:

http://triton.towson.edu/~schmitt/java/jdbc/doc/jdbcoci3.htm
0
 

Author Comment

by:aks143
ID: 17088448
Hi acton, thanks for your response. Actually first i thought..you got it. But then i had look at some articles and seems that AL32UTF8 is just like UTF-8 encoding on the database end. See the explaination here
http://www.cs.utah.edu/classes/cs5530-gary/oracle/doc/B10501_01/server.920/a96529/ch9.htm#16426

I am definitely using oracle thin driver. But looking into details it seems OCI drivers are not better performance wise.
0
 

Author Comment

by:aks143
ID: 17153768
Hi all,

finally i came to know why the page was getting truncated. I am also using sitemesh and if the page is not explicitally excluded from decoration, sitemesh set the content-length wrong.

The approaches i and others mentioned, are all correct. And i prefer to get the points refunded.

thanks
aks
0
 
LVL 1

Accepted Solution

by:
GhostMod earned 0 total points
ID: 17185266
Closed, 350 points refunded.

GhostMod
Community Support Moderator
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17185988
good to know.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Primefaces: How to display a variable that can take only 'Y' or 'N' or a boolean value. 2 114
maven project in eclipse 11 57
constructor overloading 2 79
maven set up 2 128
Some code to ensure data integrity when using macros within Excel. Also included code that helps secure your data within an Excel workbook.
An analysis of the phishing scam that has been affecting Google users, along with steps to take for protection, as well as what to do if you receive one of the emails.
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …
In this video I am going to show you how to back up and restore Office 365 mailboxes using CodeTwo Backup for Office 365. Learn more about the tool used in this video here: http://www.codetwo.com/backup-for-office-365/ (http://www.codetwo.com/ba…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now