How to re encode UTF-8 to russian or something else

Posted on 2004-10-29
Last Modified: 2008-01-09

My problem is following!
I get an UTF-8 String, which was typed in in russian from following jsp:

<%@page contentType="text/html"%>
<%@page pageEncoding="UTF-8" %>
<%@page language="java"%>

In the called jsp i tried to split this up with byte[] bytessplitted = abvaluea.getBytes( "UTF-8" )

Called JSP:

<%@page contentType="text/html"%>
<%@page pageEncoding="UTF-8" %>
<%@page language="java"%>

i tried russian:

I entered: &#1092;&#1099;&#1074;&#1072;
Convert to bytes: from HTTP (UTF-8) splitted with byte[] bytessplitted = abvaluea.getBytes( "UTF-8" );
ReEncode: <%= "UTF8 to Russian Cp1251:"+new String(bytess, "Cp1251")+"<br>"%><%
%><%= "UTF8 to Russian ISO8859_5:"+new String(bytess, "ISO8859_5")+"<br>"%><%
%><%= "UTF8 to Russian Cp1025:"+new String(bytess, "Cp1025")+"<br>"%><%
%><%= "UTF8 to Russian Cp855:"+new String(bytess, "Cp855")+"<br>"%><%
%><%= "UTF8 to Russian Cp866:"+new String(bytess, "Cp866")+"<br>"%><%
%><%= "UTF8 to Russian KOI8_R:"+new String(bytess, "KOI8_R")+"<br>"%><%

result on the new jsp:
UTF8 to Russian Cp1251:&#1043;‘&#1042;„&#1043;‘&#1042;‹&#1043;&#1106;&#1042;&#1030;&#1043;&#1106;&#1042;°
UTF8 to Russian ISO8859_5:&#1059;‘&#1058;„&#1059;‘&#1058;‹&#1059;&#1058;&#1042;&#1059;&#1058;&#1040;
UTF8 to Russian Cp1025:CjBdCjB&#1077;C&#1081;B&#1079;C&#1081;B&#1100;
UTF8 to Russian Cp855:&#9500;&#1033;&#9516;&#1105;&#9500;&#1033;&#9516;&#1030;&#9500;&#1113;&#9516;&#9619;&#9500;&#1113;&#9516;&#9617;
UTF8 to Russian Cp866:&#9500;&#1057;&#9516;&#1044;&#9500;&#1057;&#9516;&#1051;&#9500;&#1056;&#9516;&#9619;&#9500;&#1056;&#9516;&#9617;
UTF8 to Russian KOI8_R:&#1094;&#9618;&#1073;&#9492;&#1094;&#9618;&#1073;&#9600;&#1094;&#9617;&#1073;&#9569;&#1094;&#9617;&#1073;&#9567;

correct display of the characters:

when i Used it with an German charakter e.g. "string" then it works!!
So when i want to display this now in a jsp i tried following:
<%= "UTF8 to German:"+new String(bytessplitted , "ISO8859_1")+"<br>"%>

result is:

So this is fine

Is my russian usage false?

Please help me!

Question by:gramesg
    LVL 2

    Expert Comment

    it is not a safe assumption that the characters are coming to you in UTF-8, in fact thay are likely not to be. This code seems to work for windows on ie and mozilla. I would have thought you could convert the chars to utf-16 at least but i suspect the could be bugs in tomcat (which I am testing on) and/or the browsers.

    <%@ page pageEncoding="UTF-8" %>
    String charset_in="KOI8_R";String charset_out="KOI8_R";
    response.setHeader("Content-Type","text/html; charset="+charset_out); %>
    <TITLE>Form page</TITLE>
    <meta http-equiv="Content-Type" content="text/html;charset=<%= charset_out %>" >
    String param=request.getParameter("data");
    if (param!=null)
    param=new String(param.getBytes(),charset_in);
    <%= param %>
    <form action="./russian_encode.jsp" method="get" >
    <input type="text" name="data" value="<%= param %>"/>
    <input  type="submit"  value="GO"/>  

    Author Comment


    Thank you for your fast response!

    I tried it out now and it is working, but only when i set in my browser the coding to cyrillic by hand!
    Because before i do this it was set to Automatic and west eruopean -> and the result was only that i see false signs!

    How can i do this via the source.

    My second point is now i want to display the a static text in the same jsp page in a different language (arabic for example)
    Is this possible??

    Thanks Gernot
    LVL 2

    Expert Comment

    yeah i'm not sure it showed up automatically on mine so I'm not sure why you had to change, I'm not sure wheather there are bugs in the browsers but it didnt seem to work first time but only after i typed russian characters in. not sure why that is but i imagine that all russian s will have set their keyboards set for russian so should show up staright away. &#10;&#10;since this doesn't use utf-8 or utf-16 the only option seems to be to embed your static text of different character sets in iframes so that you can change the character set for the document in each iframe.

    Author Comment

    Thanks that was it!

    nice day!

    greets from Austria!
    LVL 2

    Expert Comment

    no probs - mabe you could assign you points if you are happy.

    Author Comment

    How can i give you now the points!!

    Sorry i am new!
    LVL 2

    Accepted Solution

    hmmm, not sure as i havent posted a question. maybbe there should be "accept answer" button or something?? I'm guessing when you are logged in, as the asker there should be some button to indicate the question is resolved.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

     Java Android Coding Bundle

    Whether you're an Apple user or Android addict, learning to code for the Android platform is an extremely valuable, in-demand skill. It all starts with Java, the language behind the apps and games that make Android the top platform it is today.

    Suggested Solutions

    Easy CSR creation in Exchange 2007,2010 and 2013
    Companies keep a much closer eye on costs today, so changing to new Technology – Microsoft Office 365 is the smartest move to take.
    Need more eyes on your posted question? Go ahead and follow the quick steps in this video to learn how to Request Attention to your question. *Log into your Experts Exchange account *Find the question you want to Request Attention for *Go to the e…
    Hi everyone! This is Experts Exchange customer support.  This quick video will show you how to change your primary email address.  If you have any questions, then please Write a Comment below!

    856 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now