UniCode to SJIS

How do we get the SJIS equivalent value of a Unicode character in Java ???
gravindrababuAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sgomsCommented:
String myString = "<initialize to the string u want to";
bytes[] SJISbytes=myString.getBytes("SJIS"); //specify the encoding

//get ur SJIS string using,

String SJISString=new String(SJISBytes,"SJIS");

-sgoms
0
sgomsCommented:
String myString = "<initialize to the string u want to";
bytes[] SJISbytes=myString.getBytes("SJIS"); //specify the encoding

//get ur SJIS string using,

String SJISString=new String(SJISBytes,"SJIS");

-sgoms
0
gravindrababuAuthor Commented:
sgoms, I have done upto that. But what I actually need is 'int' value of the SJIS code. So, if I typecast each character of the resulted SJISString to int, it should result me the int equivalent of the SJIS code of the particular character. But it is still  giving me the Unicode Value only. So, can u please suggest me on this ?
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

sgomsCommented:
Remember that the length of any conversion is not necessarily the same as the length of the source. For example, when converting the SJIS encoding to Unicode, sometimes one byte will convert into a single Unicode character, and sometimes two bytes will.

-sgoms
0
sgomsCommented:
can you post ur code so that we can take it from there.

-sgoms
0
gravindrababuAuthor Commented:
sgoms, thanx for the reply. Please go thru the code ...

import java.util.StringTokenizer;
import java.io.*;

public class KanjiRangeCheck
{
      public static void kanjiRangeCheck(String str)
      {

            try
            {
            byte[] byteArray = str.getBytes();
            String strSJISString = new String(byteArray,"SJIS");
            for(int i = 0;i < strSJISString.length(); i++)
            {
                int iChar = (int)strSJISString.charAt(i);
                System.out.println("Int Equivalent of SJIS Char "+iChar);
                  }
              }
            catch(Exception e )
            {
                e.printStackTrace();
                return false;
          }
      }
    public static void main(String[] args)
    {
        if(args.length == 0)
            {
                  System.err.println("Usage : \n" +
              "java KanjiRangeCheck Kanji  ");
                  return;
          }
        String sKanji = String.valueOf(args[0]);
        KanjiRangeCheck.kanjiRangeCheck(sKanji);
    }

}
0
heyhey_Commented:
what about

byte[] byteArray = str.getBytes("SJIS");
for(int i = 0;i < byteArray.length; i++)
{
  System.out.println("Int Equivalent of SJIS Char "+ byteArray[i]);
}
               
0
sgomsCommented:
Check out,
import java.util.StringTokenizer;
import java.io.*;

public class KanjiRangeCheck
{
public static void kanjiRangeCheck(String str)
{

try
{
            byte[] byteArray = str.getBytes("SJIS");
                                    byte[] defaultBytes = str.getBytes();
            String strSJISString = new String(byteArray,"SJIS");
                                    System.out.println("SJIS Str--"+strSJISString);
            for (int k = 0; k < byteArray.length; k++) {
                                                System.out.println("SJIS" + "[" + k + "] = " + "0x" +UnicodeFormatter.byteToHex(byteArray[k]));
                                    }
                                    for (int k = 0; k < defaultBytes.length; k++) {
                                                System.out.println("Default" + "[" + k + "] = " + "0x" +UnicodeFormatter.byteToHex(defaultBytes[k]));
                                    }
                                    
   }
catch(Exception e )
{
     e.printStackTrace();
   
    }
}
    public static void main(String[] args){
        String sKanji =  new String("A" + "\u00ea" + "\u00f1" +"\u00fc" + "C");
        KanjiRangeCheck.kanjiRangeCheck(sKanji);
    }

}
class UnicodeFormatter  {

   static public String byteToHex(byte b) {
      // Returns hex String representation of byte b
      char hexDigit[] = {
         '0', '1', '2', '3', '4', '5', '6', '7',
         '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
      };
      char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(array);
   }

   static public String charToHex(char c) {
      // Returns hex String representation of char c
      byte hi = (byte) (c >>> 8);
      byte lo = (byte) (c & 0xff);
      return byteToHex(hi) + byteToHex(lo);
   }

} // class


i have used the UnicodeFormatter class to display the hex value of the byte. that wud give u the accurate value of ur byte. if u type cast it to char 'cos its unsigned u will lose the fst eight bytes resulting in a very diff data.

-sgoms
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gravindrababuAuthor Commented:
Thanx sgoms, I got the problem and the solution u have provided solved my problem. Actually, I was trying different ways to convert the byte to Hexa value, but I could not succeed. I am not familiar with bitwise operations and am trying to understand ur code. If possible, could u please explain me the bit-wise operations ur doing.
0
sgomsCommented:
lets consider a byte value of 63.
its binary representation is
0011 1111

if we convert it to hex it'll be

0011 1111
-----  ------
  3       f

so the hex value is 0x3f

that's manual conversion. programatically we need to fetch the higher order 4 bits (i.e 0011) & get its eqivalent hex number.

so by shifting the byte by 4 bits what we are doing is,

0011 1111 >> 4 = 0000 0011

i.e u remove the lower order 4 bits out of the scene.
>> is signed shifting. so in case ur byte has negative value say -63 then it'll be something like,
1100 0001
when you >> 4 u'll get
1111 1100
you'll notice that the higher order bits r filled with 1's instead of 0's. that's because >> is a signed shift & it carries the sign when its shifted. it'll left with the trailing bit.

ok..once u've shifted the bits, inorder to fetch the lower order bits alone you and t with 0000 1111 (0x0f)

so u'll get
0000 0011(&)
0000 1111
------------
0000 0011(3)

in case ur number is -ve u'll remove the higer order 1s
1111 1100(&)
0000 1111
------------
0000 1100

so now uve got 3 as the fst value

the same way fetch the higher order bits by simply anding with f0

0011 1111(&)
0000 1111
------------
0000 1111(15)

this will get u 15 which shud be fetched as f from the array.

so no u've gor 3 & f
concatenate it to get
0x3f.

-sgoms
0
gravindrababuAuthor Commented:
Thanx sgoms. Understood the stuff.
0
sgomsCommented:
great
-sgoms
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.