Link to home
Start Free TrialLog in
Avatar of aks143
aks143

asked on

problems displaying chinese chars. with utf-8 encoding

Hi All,

From last two days, i m having nightmare with displaying chinese characters in a web page. I have done everything, seems required. The characters are displayed properly but the html page generated is incomplete.

Please note: The html generated by the servlet engine is incomplete. Seems a problem with the JspServlet..but have no idea.

The things i did

Option 1) Use the following directives and meta tags

<%@ page pageEncoding="UTF-8"%>
<%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
<html>
<head>
      <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
...etc (get chinese content from database)
</body>
</html>

Option 2) to simplify and avoid adding these lines in all jsps, i just use a Servlet filter to set the response character encoding..

response.setCharacterEncoding("UTF-8");
response.setContentType("text/html; charset=UTF-8");
response.setHeader("Content-Type","text/html; charset=UTF-8");

Apart from that, the tomcat web.xml is properly set for javaEncoding like
<init-param>
            <param-name>javaEncoding</param-name>
            <param-value>UTF8</param-value>
        </init-param>

On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8.

I am not talking about uri encoding or anything on the request params. Just wish to display the page properly. Any ideas??

regards,
aks


 
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

this works for me to display UTF-8 Characters in Chinese in a previous application:

<%@ page language="java" pageEncoding="utf8" contentType="text/html;charset=utf-8" %>
...

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">



if will not work if your data itself is not properly formatted in UTF-8 format...
Avatar of aks143
aks143

ASKER

Hi ryancys,

Thanks for your reply. There is no difference in your solution and the one i posted. I don't see data problem, because it is displayed properly. The problem is the complete page is not rendered. I tried your solution also..no luck.
>>but the html page generated is incomplete.
try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
2. your html is valid and there is no missing tags in your rendered html content.
Avatar of aks143

ASKER

try check:

1. make sure your scripts didn't generate any errors, try check for server's logs if necessary here.
--> Checked. No errors reported to logs or elsewhere.

2. your html is valid and there is no missing tags in your rendered html content.
--> rendered html content is the problem, because the servlet engine is not generating the complete html page [Only in case the characters are chinese]. for english characters the page rendered fine and complete. Strange!!

I believe, the bytes used by the chinese are more per character and so the servlet engine is trouble displaying them.
Avatar of aks143

ASKER

even for jsp runtime compilation, i have added -Dfile.encoding=UTF-8 in the tomcat's catalina.bat
>>The characters are displayed properly but the html page generated is incomplete.

     how do you write out those chinese characters into jsp? show me code you render them out.
Avatar of aks143

ASKER

for testing purposes, the following code. Otherwise, using struts taglibs in the view layer..

get the connection, and make a query...

ResultSet set = statement.getResultSet ();
while (set.next ())
{
      out.println(set.getString(1));
      out.println("-->");
      out.println(set.getString(2)); // here chinese chars.
}
Avatar of aks143

ASKER

actonwang. Do you see any problem with the code?

It is driving me crazy. Can someone tell me an answer for a short question:
I see that there are Chinese Simplified, Chinese Traditional character sets available, are these all covered with utf-8 ? How one can know which character are covered under which encodings?

thanks for any help.
aks
>><%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>
     these are good. I suspect that you didn't write characters out correctly. You HAVE TO write your chinese characters out using UTF8 encoding to enable them to be displayed properly using UTF-8 encoding in client side (or html file).
     that should be your problem.

Acton
>>out.println(set.getString(2));

     are you using think JDBC driver?


>>On the backend, oracle db is used and has NLS_CHARACTERSET set to AL32UTF8
     It looks like a problem If you use JDBC think driver:

The JDBC Thin driver can access databases that use any of the following character sets:

    * US7ASCII (ASCII)
    * WE8ISO8859P1 (ISO-latin-1)
    * AL24UTFFSS (Unicode 1.2)
    * UTF8 (Unicode 2.0)

This happens automatically with no special action on your part.

Databases that use other character sets are not supported yet. The JDBC Thin driver can only use the US7ASCII character set for them.

     That means, in you case, you get US7ASCII code in your thin JDBC code which is why you have the problem.
consider to use OCI driver if possible:

refer to this:

http://triton.towson.edu/~schmitt/java/jdbc/doc/jdbcoci3.htm
Avatar of aks143

ASKER

Hi acton, thanks for your response. Actually first i thought..you got it. But then i had look at some articles and seems that AL32UTF8 is just like UTF-8 encoding on the database end. See the explaination here
http://www.cs.utah.edu/classes/cs5530-gary/oracle/doc/B10501_01/server.920/a96529/ch9.htm#16426

I am definitely using oracle thin driver. But looking into details it seems OCI drivers are not better performance wise.
Avatar of aks143

ASKER

Hi all,

finally i came to know why the page was getting truncated. I am also using sitemesh and if the page is not explicitally excluded from decoration, sitemesh set the content-length wrong.

The approaches i and others mentioned, are all correct. And i prefer to get the points refunded.

thanks
aks
ASKER CERTIFIED SOLUTION
Avatar of GhostMod
GhostMod
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
good to know.