Link to home
Start Free TrialLog in
Avatar of tgilley42
tgilley42

asked on

Bad characters from httpservletrequest.getparameter

Copy paste characters from outlook/word to text area and they get translated into something unreadable. For example copy paste the following string from outlook:
This is a test - dash, comma ) paren } curly brace ' single quote " double quote > greater than ? question mark . period & amber % percent $ dollar + plus

After reading it from the httpservletrequest it looks like this:
This is a test – dash, comma ) paren } curly brace ‘ single quote “ double quote > greater than ? question mark . period & amber % percent $ dollar + plus

The application is using JSPs with Spring 1.1.4 to call the appropriate class/method in java (1.5.15).  It looks ok in the JSP before submitting the form.

Does anyone have any ideas?
Thanks
Avatar of Mick Barry
Mick Barry
Flag of Australia image

use an appropriate encoding for your form (eg. utf8) and also make sure you are using the correct encoding server side

Avatar of tgilley42
tgilley42

ASKER

I'm using UTF-8 encoding in my JSP and within tomcat (embedded within jboss).

Is there also a character set encoding within Spring I need to configure somewhere?



JSP:
 
<%@ page
language="java"
contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"
%>
 
 
server.xml (tomcat, within jboss):
 
<Connector port="80" address="${jboss.bind.address}"
         maxThreads="250" strategy="ms" maxHttpHeaderSize="8192"
         emptySessionPath="true"
         enableLookups="false" redirectPort="8443" acceptCount="100"
         connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8"/>

Open in new window

I sniffed the network to capture the packets leaving the browser. This is not related to anything on the server but an issue with word or outlook if it uses word to view and create emails. word uses utf-16 so the chars are being replaced with 3 bytes instead of the normal two bytes. Is there a convenience method to translate utf16 to utf8 in javascript or some other way to strip off the excess formatting and stuff word leaves on the copied text?
Thanks  
what browser?

SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
if you're having to deal with M$ characters then try using 8859 as the encoding.
or add a filter to map them when received.


ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial