Solved

Unicode / UTF-8 Weirdness with JSP / Servlets on Tomcat

Posted on 2009-05-07
5
460 Views
Last Modified: 2013-11-24
My code actually works, but I do not understand how it is working, and I need to in order to fix another part of the system.

I have a jsp page which contains a contenteditable div which is used (IE only) as a simple html editor. The charset in the page is utf-8. For the purposes of this example I copy-and-paste  a Greek beta character into the div, and nothing else, and submit the contents to the server. I then intercept the passed parameter on the server thus :

String html = request.getParameter("dochtml");

Here is what I don't understand. The beta character is not represented as I would expect as u03B2, but instead as two characters u00CE and u00B2. These are 'the Latin capital letter I with circumflex' and 'Superscript 2' (http://www.unicode.org/charts/PDF/U0080.pdf and see the attached image).

So what is going on ? Why am I seeing these characters ?

To confuse matters even more, this code is working - if I save the html in a mysql db (using utf8), and then pull it out into the browser, sure enough there is the greek beta character again. How is this happening ?

Thanks in advance for any pointers.


debug.png
0
Comment
Question by:emsttam
  • 3
  • 2
5 Comments
 
LVL 12

Expert Comment

by:Gibu George
ID: 24328278
What is happening is the String html is the default os encoded string which I think is not UTF-8, most probably cp1252, which represents the UTF-8 (as they are double byte characters) as two separate single byte chars. When the page is loaded into the browser, and as you have set the encoding of the content to UTF-8, the browser does a decode and it is shown properly
0
 

Author Comment

by:emsttam
ID: 24328349
gibu george,

Where / when do you think this 'conversion' is happening? Java uses Unicode internally, so why any OS related conversion ?
0
 
LVL 12

Expert Comment

by:Gibu George
ID: 24328448
you mean the cp1252, it is the normal windows charset
0
 
LVL 12

Accepted Solution

by:
Gibu George earned 500 total points
ID: 24328463
java uses os encoding by default, if you want a string to be encoded in unicode u need to use
new String(oldString.getBytes(),"UTF-8")
0
 

Author Closing Comment

by:emsttam
ID: 31579108
Thanks, you got me pretty much there. I think rather than the default platform charset it may be using 8859-1 (see the second entry here http://www.jguru.com/faq/view.jsp?EID=137049), but that's splitting hairs.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
zeroMAx challenge 20 76
Understanding websocket example in spring 1 42
Problem to file 3 48
Suggestion on WebSite Template Sites 6 52
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
It’s a strangely common occurrence that when you send someone their login details for a system, they can’t get in. This article will help you understand why it happens, and what you can do about it.
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to selectively show certain fields based on user input using rules to gather relevant information and data from your forms. The rules feature provides you with an opportunity…
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to choose which pages of your form are visible to your users based on their inputs. The page rules feature provides you with an opportunity to create if:then statements for y…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now