Solved

Using UTF-8 with JSP

Posted on 2006-11-09
5
1,225 Views
Last Modified: 2009-02-19
Hello experts,

I cannot get my JSP page to receive UTF-8 strings and output UTF-8 data, even after going through a tutorial. I made a very simple XHTML page below, which has a text field and submit button. The text that is submitted is displayed in the web browser. This works fine for ASCII data, but when I try the input ®±² (Restricted trademark, plus or minus, squared) I receive the output ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

Does anyone know what I am doing incorrectly? I've tried doing this with and without encoding the special HTML characters by the way. Also, when I view the encoding on both IE and Firefox, both state that the encoding is in UTF-8.

<%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<%!
      
      // Input: ®±² (Restricted trademark, plus or minus, squared)
      // Output on IE and Firefox: ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

    /**
     * Replaces instances of ", <, >, and & with their respective &# equivalents.
     * Hence, this method actually encodes XML as well. Note that this method does
     * not encode a single quote (').
     */
    String encodeHTML(String s) {
        if (s != null) {
            StringBuffer out = new StringBuffer();
            for (int i = 0; i < s.length(); i++) {
                char c = s.charAt(i);
                if (c == '"' || c == '<' || c == '>' || c == '&') {
                    out.append("&#" + (int) c + ";");
                } else {
                    out.append(c);
                }
            }
            return out.toString();
        }
        return null;
    }
%>
<%
      request.setCharacterEncoding("UTF-8");
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Unicode</title>
</head>
<body>
      <form method="get" action="this.jsp">
      <input type="text" name="unicode-text" />
      <input type="submit" value="Submit" />
      </form>
      <strong><%=encodeHTML(request.getParameter("unicode-text"))%></strong>
</body>
</html>

Thanks,
Joe
0
Comment
Question by:jmiller239
  • 2
5 Comments
 
LVL 17

Expert Comment

by:Dushan De Silva
ID: 17912187
Can you try with "ISO-8859-1" other than "UTF-8"

BR Dushan
0
 

Author Comment

by:jmiller239
ID: 17919578
BR Dushan,

I'd really prefer to use "UTF-8" over "ISO-8859-1"... Even though I understand ISO-8859-1 will work and will accept inputs in their &# ; forms I don't want to store the &# ; things in my UTF-8 database.

Also, I think that when I call encodeHTML( ) (to stop JavaScript injections and simply allow data with symbols such as < and > to show) it will turn the &# ; into &amp;# ; (or whatever the number equivalent is for &amp;).

Has anyone gotten UTF-8 to work with JSP?

-Joe
0
 

Author Comment

by:jmiller239
ID: 18029337
I found the solution.

<%=encodeHTML(new String(request.getParameter("unicode-text").getBytes("ISO-8859-1"),"UTF-8"))%>

I also created these two methods and I am providing them since they may be useful to others.

I was using Tomcat as my server. In order for the methods to work, request must be the HttpServletRequest request. You might want to pass this as a parameter.

      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The value
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String getParameter(String name) throws UnsupportedEncodingException  {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String value = request.getParameter(name);
            if(value==null)
                  return null;
            return new String(value.getBytes("ISO-8859-1"),"UTF-8");
      }
      
      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The values
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String[] getParameterValues(String name) throws UnsupportedEncodingException {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String[] values = request.getParameterValues(name);
            for(int i = 0; i < values.length; i++)
                  values[i] = new String(values[i].getBytes("ISO-8859-1"),"UTF-8");
            return values;
      }
0
 

Accepted Solution

by:
CetusMOD earned 0 total points
ID: 18059546
Closed, 500 points refunded.
CetusMOD
Community Support Moderator
0

Featured Post

Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question