• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1735
  • Last Modified:

problems displaying chinese chars. with utf-8 encoding

Hi All,

From last two days, i m having nightmare with displaying chinese characters in a web page. I have done everything, seems required. The characters are displayed properly but the html page generated is incomplete.

Please note: The html generated by the servlet engine is incomplete. Seems a problem with the JspServlet..but have no idea.

The things i did

Option 1) Use the following directives and meta tags

<%@ page pageEncoding="UTF-8"%>
<%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
<html>
<head>
      <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
...etc (get chinese content from database)
</body>
</html>

Option 2) to simplify and avoid adding these lines in all jsps, i just use a Servlet filter to set the response character encoding..

response.setCharacterEncoding("UTF-8");
response.setContentType("text/html; charset=UTF-8");
response.setHeader("Content-Type","text/html; charset=UTF-8");

Apart from that, the tomcat web.xml is properly set for javaEncoding like
<init-param>
            <param-name>javaEncoding</param-name>
            <param-value>UTF8</param-value>
        </init-param>

On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8.

I am not talking about uri encoding or anything on the request params. Just wish to display the page properly. Any ideas??

regards,
aks


 
0
aks143
Asked:
aks143
  • 7
  • 5
  • 2
  • +1
1 Solution
 
Ryan ChongCommented:
this works for me to display UTF-8 Characters in Chinese in a previous application:

<%@ page language="java" pageEncoding="utf8" contentType="text/html;charset=utf-8" %>
...

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">



if will not work if your data itself is not properly formatted in UTF-8 format...
0
 
aks143Author Commented:
Hi ryancys,

Thanks for your reply. There is no difference in your solution and the one i posted. I don't see data problem, because it is displayed properly. The problem is the complete page is not rendered. I tried your solution also..no luck.
0
 
Ryan ChongCommented:
>>but the html page generated is incomplete.
try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
2. your html is valid and there is no missing tags in your rendered html content.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
aks143Author Commented:
try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
--> Checked. No errors reported to logs or elsewhere.

2. your html is valid and there is no missing tags in your rendered html content.
--> rendered html content is the problem, because the servlet engine is not generating the complete html page [Only in case the characters are chinese]. for english characters the page rendered fine and complete. Strange!!

I believe, the bytes used by the chinese are more per character and so the servlet engine is trouble displaying them.
0
 
aks143Author Commented:
even for jsp runtime compilation, i have added -Dfile.encoding=UTF-8 in the tomcat's catalina.bat
0
 
actonwangCommented:
>>The characters are displayed properly but the html page generated is incomplete.

     how do you write out those chinese characters into jsp? show me code you render them out.
0
 
aks143Author Commented:
for testing purposes, the following code. Otherwise, using struts taglibs in the view layer..

get the connection, and make a query...

ResultSet set = statement.getResultSet ();
while (set.next ())
{
      out.println(set.getString(1));
      out.println("-->");
      out.println(set.getString(2)); // here chinese chars.
}
0
 
aks143Author Commented:
actonwang. Do you see any problem with the code?

It is driving me crazy. Can someone tell me an answer for a short question:
I see that there are Chinese Simplified, Chinese Traditional character sets available, are these all covered with utf-8 ? How one can know which character are covered under which encodings?

thanks for any help.
aks
0
 
actonwangCommented:
>><%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
     these are good. I suspect that you didn't write characters out correctly. You HAVE TO write your chinese characters out using UTF8 encoding to enable them to be displayed properly using UTF-8 encoding in client side (or html file).
     that should be your problem.

Acton
0
 
actonwangCommented:
>>out.println(set.getString(2));

     are you using think JDBC driver?


>>On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8
     It looks like a problem If you use JDBC think driver:

The JDBC Thin driver can access databases that use any of the following character sets:

    * US7ASCII (ASCII)
    * WE8ISO8859P1 (ISO-latin-1)
    * AL24UTFFSS (Unicode 1.2)
    * UTF8 (Unicode 2.0)

This happens automatically with no special action on your part.

Databases that use other character sets are not supported yet. The JDBC Thin driver can only use the US7ASCII character set for them.

     That means, in you case, you get US7ASCII code in your thin JDBC code which is why you have the problem.
0
 
actonwangCommented:
consider to use OCI driver if possible:

refer to this:

http://triton.towson.edu/~schmitt/java/jdbc/doc/jdbcoci3.htm
0
 
aks143Author Commented:
Hi acton, thanks for your response. Actually first i thought..you got it. But then i had look at some articles and seems that AL32UTF8 is just like UTF-8 encoding on the database end. See the explaination here
http://www.cs.utah.edu/classes/cs5530-gary/oracle/doc/B10501_01/server.920/a96529/ch9.htm#16426

I am definitely using oracle thin driver. But looking into details it seems OCI drivers are not better performance wise.
0
 
aks143Author Commented:
Hi all,

finally i came to know why the page was getting truncated. I am also using sitemesh and if the page is not explicitally excluded from decoration, sitemesh set the content-length wrong.

The approaches i and others mentioned, are all correct. And i prefer to get the points refunded.

thanks
aks
0
 
GhostModCommented:
Closed, 350 points refunded.

GhostMod
Community Support Moderator
0
 
actonwangCommented:
good to know.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 7
  • 5
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now