Solved

problems displaying chinese chars. with utf-8 encoding

Posted on 2006-07-11
16
1,731 Views
Last Modified: 2008-01-09
Hi All,

From last two days, i m having nightmare with displaying chinese characters in a web page. I have done everything, seems required. The characters are displayed properly but the html page generated is incomplete.

Please note: The html generated by the servlet engine is incomplete. Seems a problem with the JspServlet..but have no idea.

The things i did

Option 1) Use the following directives and meta tags

<%@ page pageEncoding="UTF-8"%>
<%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
<html>
<head>
      <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
...etc (get chinese content from database)
</body>
</html>

Option 2) to simplify and avoid adding these lines in all jsps, i just use a Servlet filter to set the response character encoding..

response.setCharacterEncoding("UTF-8");
response.setContentType("text/html; charset=UTF-8");
response.setHeader("Content-Type","text/html; charset=UTF-8");

Apart from that, the tomcat web.xml is properly set for javaEncoding like
<init-param>
            <param-name>javaEncoding</param-name>
            <param-value>UTF8</param-value>
        </init-param>

On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8.

I am not talking about uri encoding or anything on the request params. Just wish to display the page properly. Any ideas??

regards,
aks


 
0
Comment
Question by:aks143
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 2
  • +1
16 Comments
 
LVL 52

Expert Comment

by:Ryan Chong
ID: 17081617
this works for me to display UTF-8 Characters in Chinese in a previous application:

<%@ page language="java" pageEncoding="utf8" contentType="text/html;charset=utf-8" %>
...

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">



if will not work if your data itself is not properly formatted in UTF-8 format...
0
 

Author Comment

by:aks143
ID: 17081841
Hi ryancys,

Thanks for your reply. There is no difference in your solution and the one i posted. I don't see data problem, because it is displayed properly. The problem is the complete page is not rendered. I tried your solution also..no luck.
0
 
LVL 52

Expert Comment

by:Ryan Chong
ID: 17081986
>>but the html page generated is incomplete.
try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
2. your html is valid and there is no missing tags in your rendered html content.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:aks143
ID: 17082071
try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
--> Checked. No errors reported to logs or elsewhere.

2. your html is valid and there is no missing tags in your rendered html content.
--> rendered html content is the problem, because the servlet engine is not generating the complete html page [Only in case the characters are chinese]. for english characters the page rendered fine and complete. Strange!!

I believe, the bytes used by the chinese are more per character and so the servlet engine is trouble displaying them.
0
 

Author Comment

by:aks143
ID: 17082107
even for jsp runtime compilation, i have added -Dfile.encoding=UTF-8 in the tomcat's catalina.bat
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17082306
>>The characters are displayed properly but the html page generated is incomplete.

     how do you write out those chinese characters into jsp? show me code you render them out.
0
 

Author Comment

by:aks143
ID: 17082492
for testing purposes, the following code. Otherwise, using struts taglibs in the view layer..

get the connection, and make a query...

ResultSet set = statement.getResultSet ();
while (set.next ())
{
      out.println(set.getString(1));
      out.println("-->");
      out.println(set.getString(2)); // here chinese chars.
}
0
 

Author Comment

by:aks143
ID: 17084584
actonwang. Do you see any problem with the code?

It is driving me crazy. Can someone tell me an answer for a short question:
I see that there are Chinese Simplified, Chinese Traditional character sets available, are these all covered with utf-8 ? How one can know which character are covered under which encodings?

thanks for any help.
aks
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17086163
>><%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
     these are good. I suspect that you didn't write characters out correctly. You HAVE TO write your chinese characters out using UTF8 encoding to enable them to be displayed properly using UTF-8 encoding in client side (or html file).
     that should be your problem.

Acton
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17086227
>>out.println(set.getString(2));

     are you using think JDBC driver?


>>On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8
     It looks like a problem If you use JDBC think driver:

The JDBC Thin driver can access databases that use any of the following character sets:

    * US7ASCII (ASCII)
    * WE8ISO8859P1 (ISO-latin-1)
    * AL24UTFFSS (Unicode 1.2)
    * UTF8 (Unicode 2.0)

This happens automatically with no special action on your part.

Databases that use other character sets are not supported yet. The JDBC Thin driver can only use the US7ASCII character set for them.

     That means, in you case, you get US7ASCII code in your thin JDBC code which is why you have the problem.
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17086230
consider to use OCI driver if possible:

refer to this:

http://triton.towson.edu/~schmitt/java/jdbc/doc/jdbcoci3.htm
0
 

Author Comment

by:aks143
ID: 17088448
Hi acton, thanks for your response. Actually first i thought..you got it. But then i had look at some articles and seems that AL32UTF8 is just like UTF-8 encoding on the database end. See the explaination here
http://www.cs.utah.edu/classes/cs5530-gary/oracle/doc/B10501_01/server.920/a96529/ch9.htm#16426

I am definitely using oracle thin driver. But looking into details it seems OCI drivers are not better performance wise.
0
 

Author Comment

by:aks143
ID: 17153768
Hi all,

finally i came to know why the page was getting truncated. I am also using sitemesh and if the page is not explicitally excluded from decoration, sitemesh set the content-length wrong.

The approaches i and others mentioned, are all correct. And i prefer to get the points refunded.

thanks
aks
0
 
LVL 1

Accepted Solution

by:
GhostMod earned 0 total points
ID: 17185266
Closed, 350 points refunded.

GhostMod
Community Support Moderator
0
 
LVL 19

Expert Comment

by:actonwang
ID: 17185988
good to know.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this blog we highlight approaches to managed security as a service.  We also look into ConnectWise’s value in aiding MSPs’ security management and indicate why critical alerting is a necessary integration.
If you need a simple but flexible process for maintaining an audit trail of who created, edited, or deleted data from a table, or multiple tables, and you can do all of your work from within a form, this simple Audit Log will work for you.
There's a multitude of different network monitoring solutions out there, and you're probably wondering what makes NetCrunch so special. It's completely agentless, but does let you create an agent, if you desire. It offers powerful scalability …
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question