Solved

Using UTF-8 with JSP

Posted on 2006-11-09
5
1,214 Views
Last Modified: 2009-02-19
Hello experts,

I cannot get my JSP page to receive UTF-8 strings and output UTF-8 data, even after going through a tutorial. I made a very simple XHTML page below, which has a text field and submit button. The text that is submitted is displayed in the web browser. This works fine for ASCII data, but when I try the input ®±² (Restricted trademark, plus or minus, squared) I receive the output ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

Does anyone know what I am doing incorrectly? I've tried doing this with and without encoding the special HTML characters by the way. Also, when I view the encoding on both IE and Firefox, both state that the encoding is in UTF-8.

<%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<%!
      
      // Input: ®±² (Restricted trademark, plus or minus, squared)
      // Output on IE and Firefox: ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

    /**
     * Replaces instances of ", <, >, and & with their respective &# equivalents.
     * Hence, this method actually encodes XML as well. Note that this method does
     * not encode a single quote (').
     */
    String encodeHTML(String s) {
        if (s != null) {
            StringBuffer out = new StringBuffer();
            for (int i = 0; i < s.length(); i++) {
                char c = s.charAt(i);
                if (c == '"' || c == '<' || c == '>' || c == '&') {
                    out.append("&#" + (int) c + ";");
                } else {
                    out.append(c);
                }
            }
            return out.toString();
        }
        return null;
    }
%>
<%
      request.setCharacterEncoding("UTF-8");
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Unicode</title>
</head>
<body>
      <form method="get" action="this.jsp">
      <input type="text" name="unicode-text" />
      <input type="submit" value="Submit" />
      </form>
      <strong><%=encodeHTML(request.getParameter("unicode-text"))%></strong>
</body>
</html>

Thanks,
Joe
0
Comment
Question by:jmiller239
  • 2
5 Comments
 
LVL 17

Expert Comment

by:Dushan911
Comment Utility
Can you try with "ISO-8859-1" other than "UTF-8"

BR Dushan
0
 

Author Comment

by:jmiller239
Comment Utility
BR Dushan,

I'd really prefer to use "UTF-8" over "ISO-8859-1"... Even though I understand ISO-8859-1 will work and will accept inputs in their &# ; forms I don't want to store the &# ; things in my UTF-8 database.

Also, I think that when I call encodeHTML( ) (to stop JavaScript injections and simply allow data with symbols such as < and > to show) it will turn the &# ; into &amp;# ; (or whatever the number equivalent is for &amp;).

Has anyone gotten UTF-8 to work with JSP?

-Joe
0
 

Author Comment

by:jmiller239
Comment Utility
I found the solution.

<%=encodeHTML(new String(request.getParameter("unicode-text").getBytes("ISO-8859-1"),"UTF-8"))%>

I also created these two methods and I am providing them since they may be useful to others.

I was using Tomcat as my server. In order for the methods to work, request must be the HttpServletRequest request. You might want to pass this as a parameter.

      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The value
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String getParameter(String name) throws UnsupportedEncodingException  {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String value = request.getParameter(name);
            if(value==null)
                  return null;
            return new String(value.getBytes("ISO-8859-1"),"UTF-8");
      }
      
      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The values
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String[] getParameterValues(String name) throws UnsupportedEncodingException {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String[] values = request.getParameterValues(name);
            for(int i = 0; i < values.length; i++)
                  values[i] = new String(values[i].getBytes("ISO-8859-1"),"UTF-8");
            return values;
      }
0
 

Accepted Solution

by:
CetusMOD earned 0 total points
Comment Utility
Closed, 500 points refunded.
CetusMOD
Community Support Moderator
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Any business that wants to seriously grow needs to keep the needs and desires of an international audience of their websites in mind. Making a website friendly to international users isn’t prohibitively expensive and can provide an incredible return…
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now