Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 264
  • Last Modified:

Experts Plz help | Reading actual unicode from File

Hi All,
For a project requirement, we store all the unicodes in a text file. Example of it is given below.
\u0B85
\u0B86
\u0B87
\u0B88
\u0B89
\u0B8A
\u0B8E
\u0B8F
\u0B90
\u0B92

All the above values are saved (as such meaning \u0B85 will be saved as it is in .txt file).

Now we want to read this \u0B85 using java and write it to another file with the real character (representing \u0B85).  I tried with the following code. But it still prints \u0B85 and not the actual character in the output. Is there any way to achieve this?

        try {
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("c:\\tamilUnicode.txt"), "UTF8"));
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("c:\\test.txt"), "UTF8"));
            while (br.ready()) {
                String line = br.readLine();
                String output = new String(line.getBytes(), "UTF-8");
                bw.write(output);
                bw.newLine();
            }
            bw.flush();
            bw.close();
            br.close();
        } catch (Exception e) {
            e.printStackTrace();
        }

~Rajesh.B
0
rajesh_bala
Asked:
rajesh_bala
  • 4
  • 3
  • 2
  • +2
1 Solution
 
rajesh_balaAuthor Commented:
But my question is different.
I have \u5639 as the text in a file. Please note that it is represented as \u5639 itself and not its representation.

I want to read this line from a text file (which is \u5639) and convert it back to UTF. But when i get it and write it back to another file it is storing as  \u5639 itself (rather than its character representation).
0
 
CEHJCommented:
try {
      BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("c:/tamilUnicode.txt"), "UTF8"));
      BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("c:/test.txt"), "UTF8"));
      String line = null;
      while ((line = r.readLine()) != null) {
            char output = fromUnicodeEscaped(line);
            bw.write(output);
            //bw.newLine(); Why this?
      }
      bw.close();
      br.close();
} catch (Exception e) {
      e.printStackTrace();
}

..................
public static char fromUnicodeEscaped(String escaped) {
      escaped = escaped.toLowerCase();
      return (char) Integer.parseInt(escaped.substring(escaped.indexOf("\\u") + 2), 16);
}
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
cpa199Commented:
Try this :

        String encoding = "UTF-8";
        String inputfilename = "c:\\test.txt";
        String outputfilename = "c:\\result.txt";

        BufferedReader reader = null;
        BufferedWriter writer = null;

        try {
            reader = new BufferedReader (
                         new InputStreamReader (
                             new FileInputStream (inputfilename), "ISO-8859-1"));
            writer = new BufferedWriter (
                         new OutputStreamWriter (
                             new FileOutputStream (outputfilename), encoding));

            String line = null;
            outer: while ((line = reader.readLine ()) != null) {
                int index = line.indexOf ("\\u");
                while (index > -1) {
                    writer.write (line.substring (0, index));
                    // if there are not enough chars left, an exception will be thrown
                    String temp = line.substring (index + 2, index + 6);
                    // exceptions could be thrown if convertion failed.
                    writer.write ((char) Integer.valueOf (temp, 16).intValue());
                    if (index + 6 > line.length ()) {
                        // end of line
                        writer.write ("\n");
                        continue outer;
                    }
                    line = line.substring (index + 6);
                    index = line.indexOf ("\\u");
                }
                writer.write (line);
                writer.write ("\n");
            }
        }
        catch (FileNotFoundException ex) {
           System.out.println ("File Not Found Exception:");
           ex.printStackTrace ();
        }
        catch (UnsupportedEncodingException ex) {
           System.out.println ("Unsupported Encoding Exception:");
           ex.printStackTrace ();
        }
        catch (IOException ex) {
           System.out.println ("IO Exception:");
           ex.printStackTrace ();
        }
        catch (Exception ex) {
           System.out.println ("General exception: ");
           ex.printStackTrace ();
        }
        finally {
            try {
                reader.close ();
                writer.close ();
            }
            catch (Exception ex) {
            }
        }
0
 
WebstormCommented:
Hi rajesh_bala,

Try this:

  FileInputStream fis=new FileInputStream("c:/tamilUnicode.txt");
  FileOutputStream fos=new FileOutputStream("c:/test.txt");

  byte[] buff=new byte[8192];
  int sz=0,state=0,code=0;

  while ((sz=fis.read(buff,0,buff.length))>=0)
  {
       for (int i=0;i<buff.length;i++)
       {
             byte b=buff[i];
             switch(state)
             {
                  case 0:
                       if (b=='\\') state=1;
                       else fos.write(b);
                       break;
                  case 1:
                       if (b=='u')
                       { code=0; state=2; }
                       else
                       {
                           fos.write('\\');
                           fos.write(b);
                           state=1;
                       }
                       break;
                  default:
                       code=(code<<4)|hexv(b);
                       if (++state>=6)
                       {
                           if ((code>=1)&&(code<128))
                               fos.write(code);
                           else if (code<0x800)
                           {
                               fos.write(0xC0|(code>>>6));
                               fos.write(0x80|(code&0x3F));
                           }
                           else
                           {
                               fos.write(0xE0|(code>>>12));
                               fos.write(0x80|((code>>>6)&0x3F));
                               fos.write(0x80|(code&0x3F));
                           }
                       }
             }
       }
  }

  fis.close();
  fos.close();


static int hexv(byte b)
{
     if (b<'A') return (int)b&0x0F;
     else return ((int)b&0x0F)+9;
}



0
 
WebstormCommented:
I forgot this line:

                      if (++state>=6)
                      {
                             state=0; // <---
0
 
cpa199Commented:
Of course I would like the points/some of the points from this question as I am sure every other expert would ;-)
All I know is that my code worked when I tried it, but as long as whatever was put here was helpful the credit should go to the solution you would consider most appropriate.
Just writing this so you know that I am interested in it's final disposition ;-)

Carl
0
 
WebstormCommented:
I suggest to split the points : CEHJ, cpa199, Webstorm
0
 
rajesh_balaAuthor Commented:
Extremely sorry for not logging in for a long time after getting the comments. Thankx a lot guyz for helping out.
0
 
WebstormCommented:
rajesh_bala,

the others solutions also works.
0
 
rajesh_balaAuthor Commented:
Hi Venabili,
I tried the solution of of cpa199 first and alloted the points. Solutions of CEHJ and Webstorm are also working. Please spilt the points equally if possible. Sorry for the inconvinence.

~Rajesh.B
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

  • 4
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now