Solved

Using UTF-8 with JSP

Posted on 2006-11-09
5
1,222 Views
Last Modified: 2009-02-19
Hello experts,

I cannot get my JSP page to receive UTF-8 strings and output UTF-8 data, even after going through a tutorial. I made a very simple XHTML page below, which has a text field and submit button. The text that is submitted is displayed in the web browser. This works fine for ASCII data, but when I try the input ®±² (Restricted trademark, plus or minus, squared) I receive the output ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

Does anyone know what I am doing incorrectly? I've tried doing this with and without encoding the special HTML characters by the way. Also, when I view the encoding on both IE and Firefox, both state that the encoding is in UTF-8.

<%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<%!
      
      // Input: ®±² (Restricted trademark, plus or minus, squared)
      // Output on IE and Firefox: ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

    /**
     * Replaces instances of ", <, >, and & with their respective &# equivalents.
     * Hence, this method actually encodes XML as well. Note that this method does
     * not encode a single quote (').
     */
    String encodeHTML(String s) {
        if (s != null) {
            StringBuffer out = new StringBuffer();
            for (int i = 0; i < s.length(); i++) {
                char c = s.charAt(i);
                if (c == '"' || c == '<' || c == '>' || c == '&') {
                    out.append("&#" + (int) c + ";");
                } else {
                    out.append(c);
                }
            }
            return out.toString();
        }
        return null;
    }
%>
<%
      request.setCharacterEncoding("UTF-8");
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Unicode</title>
</head>
<body>
      <form method="get" action="this.jsp">
      <input type="text" name="unicode-text" />
      <input type="submit" value="Submit" />
      </form>
      <strong><%=encodeHTML(request.getParameter("unicode-text"))%></strong>
</body>
</html>

Thanks,
Joe
0
Comment
Question by:jmiller239
  • 2
5 Comments
 
LVL 17

Expert Comment

by:Dushan De Silva
ID: 17912187
Can you try with "ISO-8859-1" other than "UTF-8"

BR Dushan
0
 

Author Comment

by:jmiller239
ID: 17919578
BR Dushan,

I'd really prefer to use "UTF-8" over "ISO-8859-1"... Even though I understand ISO-8859-1 will work and will accept inputs in their &# ; forms I don't want to store the &# ; things in my UTF-8 database.

Also, I think that when I call encodeHTML( ) (to stop JavaScript injections and simply allow data with symbols such as < and > to show) it will turn the &# ; into &amp;# ; (or whatever the number equivalent is for &amp;).

Has anyone gotten UTF-8 to work with JSP?

-Joe
0
 

Author Comment

by:jmiller239
ID: 18029337
I found the solution.

<%=encodeHTML(new String(request.getParameter("unicode-text").getBytes("ISO-8859-1"),"UTF-8"))%>

I also created these two methods and I am providing them since they may be useful to others.

I was using Tomcat as my server. In order for the methods to work, request must be the HttpServletRequest request. You might want to pass this as a parameter.

      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The value
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String getParameter(String name) throws UnsupportedEncodingException  {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String value = request.getParameter(name);
            if(value==null)
                  return null;
            return new String(value.getBytes("ISO-8859-1"),"UTF-8");
      }
      
      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The values
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String[] getParameterValues(String name) throws UnsupportedEncodingException {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String[] values = request.getParameterValues(name);
            for(int i = 0; i < values.length; i++)
                  values[i] = new String(values[i].getBytes("ISO-8859-1"),"UTF-8");
            return values;
      }
0
 

Accepted Solution

by:
CetusMOD earned 0 total points
ID: 18059546
Closed, 500 points refunded.
CetusMOD
Community Support Moderator
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Learn by example how to specify CSS selectors for Selenium WebDriver test automation software.
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question