• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1150
  • Last Modified:

HTML Form posting contains weird characters when using single quote, double quote or dash

I am using the following FORM in my webpage.
<Form name=addreview method=POST action=savereview.jsp>
Title (less than 50 characters) <Input type=text size=50 name=title><Input type=submit value="Upload Review"><BR>
Review (less than 3500 characters)<BR><TextArea rows=35 cols=100 name=description> </TEXTAREA><BR>

if I place a single quote (') , double quote (") or dash/hyphen (-) in the textarea control and click submit I get funny characters being sent to my Tomcat 5.5 server. If I have the following script in a .jsp page I get the following results.

System.out.println (request.getParameter ("description").trim());

gives me:

Hello, I?m Sharron, the last of the truly ?sick romantics? ? or so some of my be
st friends have dubbed me ? and frankly they?re absolutely right!  But I?m also
passionate about food so think about that combination ? food and romance ? orgasmic!  With this in mind, I?m on a constant quest to find a spot to dine in the very best of romantic settings with the very best of cuisine!

As you can see the apostrophe's are replace with something. Is there a way to get exactly what was entered?
  • 3
1 Solution
Did you happen to paste this text from a word processor?  If so, the text may not be exactly what you think it is.

Many modern word processors replace the generic apostrophe and quotation marks with their Unicode equivalents which are facing either left or right.  Similarly, they'll replace hyphens with Unicode dashes.  Your system's console may not be able to display Unicode characters, which is why it prints a ? instead.

Luckily, Java's Strings use Unicode under the hood (the char primitive is Unicode), so it should be simple enough to replace the offending characters in this string with their good ol' ASCII equivalents.  The following link has the information you'll need to write a method to perform this task.


If you need some help writing this method, please feel free to ask.
Why dont you replace them with relavant html entities? like replace " to &quot; etc etc
To clarify my answer, I believe the form is submitting the characters correctly, but the console is displaying them incorrectly (as question marks).

Try replacing the Unicode quotation characters with ASCII ones by using this method:

private static String convertToASCII(String str) {
    char single = '\'';
    char double = '"';
    char dash = '-';
    return str.replace((char) 0x2018, single)
            .replace((char) 0x2019, single)
            .replace((char) 0x201c, double)
            .replace((char) 0x201d, double)
            .replace((char) 0x2010, dash)
            .replace((char) 0x2011, dash)
            .replace((char) 0x2012, dash)
            .replace((char) 0x2013, dash)
            .replace((char) 0x2014, dash)
            .replace((char) 0x2015, dash)
            .replace((char) 0x2212, dash);

The second answer I gave (#20050807) is more complete.

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now