[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 292
  • Last Modified:

HTML encoding string

Hello!

In JSP page assume I have utf8 encoded text... Or a text I want to encode them in this format:
سرچ

How this is possible? How this charachter can be created? In frontpage for example if you type arabic and save file, it encodes arabic text in this format automatically...

Thanks from now!
0
CSecurity
Asked:
CSecurity
  • 13
  • 10
  • 7
1 Solution
 
Ajay-SinghCommented:
> How this is possible? How this charachter can be created?
They are xml-escaped.
0
 
Ajay-SinghCommented:
0
 
ksivananthCommented:
you will have to generate manually!

the below code is from some editor,

    private String loadConvert(String theString)
    {
        int len = theString.length();
        StringBuffer outBuffer = new StringBuffer(len);
        for(int x = 0; x < len;)
        {
            char aChar = theString.charAt(x++);
            if(aChar == '&')
            {
                aChar = theString.charAt(x++);
                if(aChar == '#')
                {
                    String buff = "";
                    int unicodeValue = -1;
                    int i = 0;
                    do
                    {
                        if(i >= 10)
                            break;
                        aChar = theString.charAt(x++);
                        if(aChar == ';')
                            break;
                        buff = buff + aChar;
                        i++;
                    } while(true);
                    try
                    {
                        unicodeValue = Integer.parseInt(buff);
                        outBuffer.append((char)unicodeValue);
                    }
                    catch(Throwable throwable)
                    {
                        outBuffer.append("&#" + buff + (aChar != ';' ? 32 : ';'));
                    }
                } else
                {
                    outBuffer.append('&');
                    outBuffer.append(aChar);
                }
            } else
            {
                outBuffer.append(aChar);
            }
        }

        return outBuffer.toString();
    }

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
CSecurityAuthor Commented:
Thanks alot ksivananth... I need exactly reverse of this function... I want to encode a string which is normal to &# .... format!

Thanks alot
0
 
CSecurityAuthor Commented:
I saw you got it from java.util.properties.java and in above of this function you paste here it wrotes:

 /*
     * Converts encoded \\uxxxx to unicode chars
     * and changes special saved chars to their original forms
     */
I want exactly reverse of it I want to convert unicode chars to \\uxxxx encoded style

Thanks alot
0
 
Ajay-SinghCommented:
This should convert to char:

public char toChar(String encoded) {
   return Character.fromDigit(Integer.parseInt(encoded.substring(2, encoded.length()-1),10);
}
0
 
Ajay-SinghCommented:
sorry this is the correct code:

public char toChar(String encoded) {
            return (char) Integer.parseInt(encoded.substring(2,
                        encoded.length() - 1));
      }
0
 
CSecurityAuthor Commented:
What this code do?

I want to convert an arabic string like this:
'DA
to &#5739;&#5930 and ...

This code do it?
0
 
CSecurityAuthor Commented:
EE doesn't support arabic typing... anyway I wrote some arabic text and I want to convert it to that style
0
 
Ajay-SinghCommented:
> What this code do?
This converts &#5739; kind of tokens to characters.
0
 
CSecurityAuthor Commented:
As I said I want exactly reverse of it... I want to convert charachters to &#5739 kind
0
 
ksivananthCommented:
you have to take the numbers in it and convert it to char, for e.g.,

&#5739; will be converted as,
here the unicode value is,

int unicode = 5739 ;
//and uncode char is
char uncodeChar = (char)uncode ;
0
 
ksivananthCommented:
if you want the reverse,

char uncideChar = '\uxxxx' ;
int unicode = uncideChar ;
String unicodeString = "&#" + unicode + ";" ;
0
 
Ajay-SinghCommented:
Then you can use this:

      public String encode(String decoded){
            StringBuffer builder = new StringBuffer();
            for(int i=0;i<decoded.length();i++){
                  char c = decoded.charAt(i);
                  builder.append("&#");
                  builder.append(String.valueOf(c));
                  builder.append(";");
            }
            
            return builder.toString();
      }
0
 
Ajay-SinghCommented:
> builder.append(String.valueOf(c));

change that to

builder.append(String.valueOf((int)c));
0
 
CSecurityAuthor Commented:
Thanks for code... It converts to right style but contains wrong data...

I debugged and I found when I send a arabic charachter it shows 2 ascii charachters...

So if I send an arabic word with 4 length your for next loop works 8 times and it converts to some codes which not works... Why this happen?
0
 
CSecurityAuthor Commented:
For example for an arabic word with 4 chars I got this from your code:
&#216;&#179;&#217;&#132;&#216;&#167;&#217;&#133;
0
 
CSecurityAuthor Commented:
And result of what I got from your code is not right... It should be different and in frontpage I got it like this:

(for example)
&#2421;&#1234;&#1234;&#1234;

Between &# and ; chars I have 4 chars... It looks like hex... I dunno
0
 
CSecurityAuthor Commented:
In right example what I wrote in arabic shown in my asked format is this:
&#1587;&#1604;&#1575;&#1605;

This is what I wrote in arabic but I got this from your code:
&#216;&#179;&#217;&#132;&#216;&#167;&#217;&#133;

What you think?
0
 
ksivananthCommented:
how do you send the string?
0
 
CSecurityAuthor Commented:
Via HTTP request. From HTML form.

and I read like this:
String buffer = request.getParameter("T1");

And I send buffer to your function!
I'm sure I got it right format because also I call out.print(buffer) and I see arabic word
0
 
ksivananthCommented:
you can't hard code arabic text  in the program!

either read it from file or read from input console!
0
 
CSecurityAuthor Commented:
I know it is from HTTP and I test it with out.print and I see right text
0
 
ksivananthCommented:
>>and I read like this:
String buffer = request.getParameter("T1");

And I send buffer to your function!
I'm sure I got it right format because also I call out.print(buffer) and I see arabic word
>> 

makre sure the parameter is in unicode!
0
 
ksivananthCommented:
I meant unicode encoded!
0
 
CSecurityAuthor Commented:
>> I meant unicode encoded!
How? What is that? When I print out what I have in buffer I see arabic text because the form exists in a UTF-8 encoding enabled HTML and I see right result... It's wrong?
0
 
ksivananthCommented:
below code just works fine!
0
 
ksivananthCommented:
public class TestHTMLEncodeUnicodeString {

      private void printString( String encodedString ){
            String data = loadConvert( encodedString ) ;
            System.out.println( data );
            System.out.println( encode( data ) );
      }
      
    private String loadConvert(String theString)
    {
        int len = theString.length();
        StringBuffer outBuffer = new StringBuffer(len);
        for(int x = 0; x < len;)
        {
            char aChar = theString.charAt(x++);
            if(aChar == '&')
            {
                aChar = theString.charAt(x++);
                if(aChar == '#')
                {
                    String buff = "";
                    int unicodeValue = -1;
                    int i = 0;
                    do
                    {
                        if(i >= 10)
                            break;
                        aChar = theString.charAt(x++);
                        if(aChar == ';')
                            break;
                        buff = buff + aChar;
                        i++;
                    } while(true);
                    try
                    {
                        unicodeValue = Integer.parseInt(buff);
                        outBuffer.append((char)unicodeValue);
                    }
                    catch(Throwable throwable)
                    {
                        outBuffer.append("&#" + buff + (aChar != ';' ? 32 : ';'));
                    }
                } else
                {
                    outBuffer.append('&');
                    outBuffer.append(aChar);
                }
            } else
            {
                outBuffer.append(aChar);
            }
        }

        return outBuffer.toString();
    }
   
    public static void main( String[] args ){
          String data = "&#1587;&#1604;&#1575;&#1605;" ;
          new TestHTMLEncodeUnicodeString().printString( data ) ;
    }
   
    public String encode(String decoded){
        StringBuffer builder = new StringBuffer();
        for(int i=0;i<decoded.length();i++){
              char c = decoded.charAt(i);
              builder.append("&#");
              builder.append( ( int )c );
              builder.append(";");
        }
       
        return builder.toString();
  }
}
0
 
CSecurityAuthor Commented:
But in JSP I got this:
&#211;&#225;&#199;&#227;

For a string which should be:
&#1587;&#1604;&#1575;&#1605;
0
 
ksivananthCommented:
here is some explanation might pertinent to you,

http://java.sun.com/developer/EJTechTips/2005/tt1220.html#1
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 13
  • 10
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now