• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1535
  • Last Modified:

Bad characters from httpservletrequest.getparameter

Copy paste characters from outlook/word to text area and they get translated into something unreadable. For example copy paste the following string from outlook:
This is a test - dash, comma ) paren } curly brace ' single quote " double quote > greater than ? question mark . period & amber % percent $ dollar + plus

After reading it from the httpservletrequest it looks like this:
This is a test – dash, comma ) paren } curly brace ‘ single quote “ double quote > greater than ? question mark . period & amber % percent $ dollar + plus

The application is using JSPs with Spring 1.1.4 to call the appropriate class/method in java (1.5.15).  It looks ok in the JSP before submitting the form.

Does anyone have any ideas?
Thanks
0
tgilley42
Asked:
tgilley42
  • 4
  • 3
2 Solutions
 
objectsCommented:
use an appropriate encoding for your form (eg. utf8) and also make sure you are using the correct encoding server side

0
 
tgilley42Author Commented:
I'm using UTF-8 encoding in my JSP and within tomcat (embedded within jboss).

Is there also a character set encoding within Spring I need to configure somewhere?



JSP:
 
<%@ page
language="java"
contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"
%>
 
 
server.xml (tomcat, within jboss):
 
<Connector port="80" address="${jboss.bind.address}"
         maxThreads="250" strategy="ms" maxHttpHeaderSize="8192"
         emptySessionPath="true"
         enableLookups="false" redirectPort="8443" acceptCount="100"
         connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8"/>

Open in new window

0
 
objectsCommented:
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
tgilley42Author Commented:
I sniffed the network to capture the packets leaving the browser. This is not related to anything on the server but an issue with word or outlook if it uses word to view and create emails. word uses utf-16 so the chars are being replaced with 3 bytes instead of the normal two bytes. Is there a convenience method to translate utf16 to utf8 in javascript or some other way to strip off the excess formatting and stuff word leaves on the copied text?
Thanks  
0
 
objectsCommented:
what browser?

0
 
CEHJCommented:
>>word uses utf-16 so the chars are being replaced with 3 bytes instead of the normal two bytes.

UTF-16 doesn't use any more than two bytes actually - that's UTF-8.

In fact it's nothing to do with either UTF-16 or UTF-8. Have a look at this:

http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
0
 
objectsCommented:
if you're having to deal with M$ characters then try using 8859 as the encoding.
or add a filter to map them when received.


0
 
tgilley42Author Commented:
This is an article with sample script that explains the issue as well

http://jonathanhedley.com/articles/2008/03/convert-microsoft-word-to-plain-text

0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now