My code actually works, but I do not understand how it is working, and I need to in order to fix another part of the system.
I have a jsp page which contains a contenteditable div which is used (IE only) as a simple html editor. The charset in the page is utf-8. For the purposes of this example I copy-and-paste a Greek beta character into the div, and nothing else, and submit the contents to the server. I then intercept the passed parameter on the server thus :
String html = request.getParameter("doch
Here is what I don't understand. The beta character is not represented as I would expect as u03B2, but instead as two characters u00CE and u00B2. These are 'the Latin capital letter I with circumflex' and 'Superscript 2' (http://www.unicode.org/charts/PDF/U0080.pdf
and see the attached image).
So what is going on ? Why am I seeing these characters ?
To confuse matters even more, this code is working - if I save the html in a mysql db (using utf8), and then pull it out into the browser, sure enough there is the greek beta character again. How is this happening ?
Thanks in advance for any pointers.