Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Using UTF-8 with JSP

Posted on 2006-11-09
5
Medium Priority
?
1,241 Views
Last Modified: 2009-02-19
Hello experts,

I cannot get my JSP page to receive UTF-8 strings and output UTF-8 data, even after going through a tutorial. I made a very simple XHTML page below, which has a text field and submit button. The text that is submitted is displayed in the web browser. This works fine for ASCII data, but when I try the input ®±² (Restricted trademark, plus or minus, squared) I receive the output ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

Does anyone know what I am doing incorrectly? I've tried doing this with and without encoding the special HTML characters by the way. Also, when I view the encoding on both IE and Firefox, both state that the encoding is in UTF-8.

<%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<%!
      
      // Input: ®±² (Restricted trademark, plus or minus, squared)
      // Output on IE and Firefox: ®±² (Capital A with an accent preceding each of the three previously mentioned characters.)

    /**
     * Replaces instances of ", <, >, and & with their respective &# equivalents.
     * Hence, this method actually encodes XML as well. Note that this method does
     * not encode a single quote (').
     */
    String encodeHTML(String s) {
        if (s != null) {
            StringBuffer out = new StringBuffer();
            for (int i = 0; i < s.length(); i++) {
                char c = s.charAt(i);
                if (c == '"' || c == '<' || c == '>' || c == '&') {
                    out.append("&#" + (int) c + ";");
                } else {
                    out.append(c);
                }
            }
            return out.toString();
        }
        return null;
    }
%>
<%
      request.setCharacterEncoding("UTF-8");
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Unicode</title>
</head>
<body>
      <form method="get" action="this.jsp">
      <input type="text" name="unicode-text" />
      <input type="submit" value="Submit" />
      </form>
      <strong><%=encodeHTML(request.getParameter("unicode-text"))%></strong>
</body>
</html>

Thanks,
Joe
0
Comment
Question by:jmiller239
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
5 Comments
 
LVL 17

Expert Comment

by:Dushan De Silva
ID: 17912187
Can you try with "ISO-8859-1" other than "UTF-8"

BR Dushan
0
 

Author Comment

by:jmiller239
ID: 17919578
BR Dushan,

I'd really prefer to use "UTF-8" over "ISO-8859-1"... Even though I understand ISO-8859-1 will work and will accept inputs in their &# ; forms I don't want to store the &# ; things in my UTF-8 database.

Also, I think that when I call encodeHTML( ) (to stop JavaScript injections and simply allow data with symbols such as < and > to show) it will turn the &# ; into &amp;# ; (or whatever the number equivalent is for &amp;).

Has anyone gotten UTF-8 to work with JSP?

-Joe
0
 

Author Comment

by:jmiller239
ID: 18029337
I found the solution.

<%=encodeHTML(new String(request.getParameter("unicode-text").getBytes("ISO-8859-1"),"UTF-8"))%>

I also created these two methods and I am providing them since they may be useful to others.

I was using Tomcat as my server. In order for the methods to work, request must be the HttpServletRequest request. You might want to pass this as a parameter.

      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The value
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String getParameter(String name) throws UnsupportedEncodingException  {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String value = request.getParameter(name);
            if(value==null)
                  return null;
            return new String(value.getBytes("ISO-8859-1"),"UTF-8");
      }
      
      /**
       * Gets the request data of name "name", assuming it is in UTF-8 encoding.
       * @param name Name of the parameter
       * @return The values
       * @throws UnsupportedEncodingException Thrown when the JVM does not support UTF-8
       */
      public String[] getParameterValues(String name) throws UnsupportedEncodingException {
            //Tomcat ALWAYS assumes the form data is in ISO-8859-1, even when the browser specifies the
            //form data with a hidden "_charset_" field and even when request.setCharacterEncoding("UTF-8"); is called.
            //Therefore, we need to convert it from ISO-8859-1 to UTF-8.
            String[] values = request.getParameterValues(name);
            for(int i = 0; i < values.length; i++)
                  values[i] = new String(values[i].getBytes("ISO-8859-1"),"UTF-8");
            return values;
      }
0
 

Accepted Solution

by:
CetusMOD earned 0 total points
ID: 18059546
Closed, 500 points refunded.
CetusMOD
Community Support Moderator
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video teaches users how to migrate an existing Wordpress website to a new domain.
Suggested Courses

671 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question