• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 528
  • Last Modified:

UniCode to SJIS

How do we get the SJIS equivalent value of a Unicode character in Java ???
0
gravindrababu
Asked:
gravindrababu
  • 7
  • 4
1 Solution
 
sgomsCommented:
String myString = "<initialize to the string u want to";
bytes[] SJISbytes=myString.getBytes("SJIS"); //specify the encoding

//get ur SJIS string using,

String SJISString=new String(SJISBytes,"SJIS");

-sgoms
0
 
sgomsCommented:
String myString = "<initialize to the string u want to";
bytes[] SJISbytes=myString.getBytes("SJIS"); //specify the encoding

//get ur SJIS string using,

String SJISString=new String(SJISBytes,"SJIS");

-sgoms
0
 
gravindrababuAuthor Commented:
sgoms, I have done upto that. But what I actually need is 'int' value of the SJIS code. So, if I typecast each character of the resulted SJISString to int, it should result me the int equivalent of the SJIS code of the particular character. But it is still  giving me the Unicode Value only. So, can u please suggest me on this ?
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
sgomsCommented:
Remember that the length of any conversion is not necessarily the same as the length of the source. For example, when converting the SJIS encoding to Unicode, sometimes one byte will convert into a single Unicode character, and sometimes two bytes will.

-sgoms
0
 
sgomsCommented:
can you post ur code so that we can take it from there.

-sgoms
0
 
gravindrababuAuthor Commented:
sgoms, thanx for the reply. Please go thru the code ...

import java.util.StringTokenizer;
import java.io.*;

public class KanjiRangeCheck
{
      public static void kanjiRangeCheck(String str)
      {

            try
            {
            byte[] byteArray = str.getBytes();
            String strSJISString = new String(byteArray,"SJIS");
            for(int i = 0;i < strSJISString.length(); i++)
            {
                int iChar = (int)strSJISString.charAt(i);
                System.out.println("Int Equivalent of SJIS Char "+iChar);
                  }
              }
            catch(Exception e )
            {
                e.printStackTrace();
                return false;
          }
      }
    public static void main(String[] args)
    {
        if(args.length == 0)
            {
                  System.err.println("Usage : \n" +
              "java KanjiRangeCheck Kanji  ");
                  return;
          }
        String sKanji = String.valueOf(args[0]);
        KanjiRangeCheck.kanjiRangeCheck(sKanji);
    }

}
0
 
heyhey_Commented:
what about

byte[] byteArray = str.getBytes("SJIS");
for(int i = 0;i < byteArray.length; i++)
{
  System.out.println("Int Equivalent of SJIS Char "+ byteArray[i]);
}
               
0
 
sgomsCommented:
Check out,
import java.util.StringTokenizer;
import java.io.*;

public class KanjiRangeCheck
{
public static void kanjiRangeCheck(String str)
{

try
{
            byte[] byteArray = str.getBytes("SJIS");
                                    byte[] defaultBytes = str.getBytes();
            String strSJISString = new String(byteArray,"SJIS");
                                    System.out.println("SJIS Str--"+strSJISString);
            for (int k = 0; k < byteArray.length; k++) {
                                                System.out.println("SJIS" + "[" + k + "] = " + "0x" +UnicodeFormatter.byteToHex(byteArray[k]));
                                    }
                                    for (int k = 0; k < defaultBytes.length; k++) {
                                                System.out.println("Default" + "[" + k + "] = " + "0x" +UnicodeFormatter.byteToHex(defaultBytes[k]));
                                    }
                                    
   }
catch(Exception e )
{
     e.printStackTrace();
   
    }
}
    public static void main(String[] args){
        String sKanji =  new String("A" + "\u00ea" + "\u00f1" +"\u00fc" + "C");
        KanjiRangeCheck.kanjiRangeCheck(sKanji);
    }

}
class UnicodeFormatter  {

   static public String byteToHex(byte b) {
      // Returns hex String representation of byte b
      char hexDigit[] = {
         '0', '1', '2', '3', '4', '5', '6', '7',
         '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
      };
      char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(array);
   }

   static public String charToHex(char c) {
      // Returns hex String representation of char c
      byte hi = (byte) (c >>> 8);
      byte lo = (byte) (c & 0xff);
      return byteToHex(hi) + byteToHex(lo);
   }

} // class


i have used the UnicodeFormatter class to display the hex value of the byte. that wud give u the accurate value of ur byte. if u type cast it to char 'cos its unsigned u will lose the fst eight bytes resulting in a very diff data.

-sgoms
0
 
gravindrababuAuthor Commented:
Thanx sgoms, I got the problem and the solution u have provided solved my problem. Actually, I was trying different ways to convert the byte to Hexa value, but I could not succeed. I am not familiar with bitwise operations and am trying to understand ur code. If possible, could u please explain me the bit-wise operations ur doing.
0
 
sgomsCommented:
lets consider a byte value of 63.
its binary representation is
0011 1111

if we convert it to hex it'll be

0011 1111
-----  ------
  3       f

so the hex value is 0x3f

that's manual conversion. programatically we need to fetch the higher order 4 bits (i.e 0011) & get its eqivalent hex number.

so by shifting the byte by 4 bits what we are doing is,

0011 1111 >> 4 = 0000 0011

i.e u remove the lower order 4 bits out of the scene.
>> is signed shifting. so in case ur byte has negative value say -63 then it'll be something like,
1100 0001
when you >> 4 u'll get
1111 1100
you'll notice that the higher order bits r filled with 1's instead of 0's. that's because >> is a signed shift & it carries the sign when its shifted. it'll left with the trailing bit.

ok..once u've shifted the bits, inorder to fetch the lower order bits alone you and t with 0000 1111 (0x0f)

so u'll get
0000 0011(&)
0000 1111
------------
0000 0011(3)

in case ur number is -ve u'll remove the higer order 1s
1111 1100(&)
0000 1111
------------
0000 1100

so now uve got 3 as the fst value

the same way fetch the higher order bits by simply anding with f0

0011 1111(&)
0000 1111
------------
0000 1111(15)

this will get u 15 which shud be fetched as f from the array.

so no u've gor 3 & f
concatenate it to get
0x3f.

-sgoms
0
 
gravindrababuAuthor Commented:
Thanx sgoms. Understood the stuff.
0
 
sgomsCommented:
great
-sgoms
0

Featured Post

Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

  • 7
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now