Solved

Using UTF-8 with JSP

Posted on 2006-11-09
5
1,227 Views
Last Modified: 2009-02-19
Hello experts,

I cannot get my JSP page to receive UTF-8 strings and output UTF-8 data, even after going through a tutorial. I made a very simple XHTML page below, which has a text field and submit button. The text that is submitted is displayed in the web browser. This works fine for ASCII data, but when I try the input ®±² (Restricted trademark, plus or minus, squared) I receive the output ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

Does anyone know what I am doing incorrectly? I've tried doing this with and without encoding the special HTML characters by the way. Also, when I view the encoding on both IE and Firefox, both state that the encoding is in UTF-8.

<%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<%!
      
      // Input: ®±² (Restricted trademark, plus or minus, squared)
      // Output on IE and Firefox: ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

    /**
     * Replaces instances of ", <, >, and & with their respective &# equivalents.
     * Hence, this method actually encodes XML as well. Note that this method does
     * not encode a single quote (').
     */
    String encodeHTML(String s) {
        if (s != null) {
            StringBuffer out = new StringBuffer();
            for (int i = 0; i < s.length(); i++) {
                char c = s.charAt(i);
                if (c == '"' || c == '<' || c == '>' || c == '&') {
                    out.append("&#" + (int) c + ";");
                } else {
                    out.append(c);
                }
            }
            return out.toString();
        }
        return null;
    }
%>
<%
      request.setCharacterEncoding("UTF-8");
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Unicode</title>
</head>
<body>
      <form method="get" action="this.jsp">
      <input type="text" name="unicode-text" />
      <input type="submit" value="Submit" />
      </form>
      <strong><%=encodeHTML(request.getParameter("unicode-text"))%></strong>
</body>
</html>

Thanks,
Joe
0
Comment
Question by:jmiller239
  • 2
5 Comments
 
LVL 17

Expert Comment

by:Dushan De Silva
ID: 17912187
Can you try with "ISO-8859-1" other than "UTF-8"

BR Dushan
0
 

Author Comment

by:jmiller239
ID: 17919578
BR Dushan,

I'd really prefer to use "UTF-8" over "ISO-8859-1"... Even though I understand ISO-8859-1 will work and will accept inputs in their &# ; forms I don't want to store the &# ; things in my UTF-8 database.

Also, I think that when I call encodeHTML( ) (to stop JavaScript injections and simply allow data with symbols such as < and > to show) it will turn the &# ; into &amp;# ; (or whatever the number equivalent is for &amp;).

Has anyone gotten UTF-8 to work with JSP?

-Joe
0
 

Author Comment

by:jmiller239
ID: 18029337
I found the solution.

<%=encodeHTML(new String(request.getParameter("unicode-text").getBytes("ISO-8859-1"),"UTF-8"))%>

I also created these two methods and I am providing them since they may be useful to others.

I was using Tomcat as my server. In order for the methods to work, request must be the HttpServletRequest request. You might want to pass this as a parameter.

      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The value
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String getParameter(String name) throws UnsupportedEncodingException  {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String value = request.getParameter(name);
            if(value==null)
                  return null;
            return new String(value.getBytes("ISO-8859-1"),"UTF-8");
      }
      
      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The values
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String[] getParameterValues(String name) throws UnsupportedEncodingException {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String[] values = request.getParameterValues(name);
            for(int i = 0; i < values.length; i++)
                  values[i] = new String(values[i].getBytes("ISO-8859-1"),"UTF-8");
            return values;
      }
0
 

Accepted Solution

by:
CetusMOD earned 0 total points
ID: 18059546
Closed, 500 points refunded.
CetusMOD
Community Support Moderator
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

"In order to have an organized way for empathy mapping, we rely on a psychological model and trying to model it in a simple way, so we will split the board to three section for each persona and a scenario and try to see what those personas would Do,…
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question