Solved

Experts Plz help | Reading actual unicode from File

Posted on 2004-09-16
12
241 Views
Last Modified: 2008-02-01
Hi All,
For a project requirement, we store all the unicodes in a text file. Example of it is given below.
\u0B85
\u0B86
\u0B87
\u0B88
\u0B89
\u0B8A
\u0B8E
\u0B8F
\u0B90
\u0B92

All the above values are saved (as such meaning \u0B85 will be saved as it is in .txt file).

Now we want to read this \u0B85 using java and write it to another file with the real character (representing \u0B85).  I tried with the following code. But it still prints \u0B85 and not the actual character in the output. Is there any way to achieve this?

        try {
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("c:\\tamilUnicode.txt"), "UTF8"));
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("c:\\test.txt"), "UTF8"));
            while (br.ready()) {
                String line = br.readLine();
                String output = new String(line.getBytes(), "UTF-8");
                bw.write(output);
                bw.newLine();
            }
            bw.flush();
            bw.close();
            br.close();
        } catch (Exception e) {
            e.printStackTrace();
        }

~Rajesh.B
0
Comment
Question by:rajesh_bala
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +2
12 Comments
 
LVL 18

Expert Comment

by:armoghan
ID: 12081934
0
 
LVL 10

Author Comment

by:rajesh_bala
ID: 12082161
But my question is different.
I have \u5639 as the text in a file. Please note that it is represented as \u5639 itself and not its representation.

I want to read this line from a text file (which is \u5639) and convert it back to UTF. But when i get it and write it back to another file it is storing as  \u5639 itself (rather than its character representation).
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12082813
try {
      BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("c:/tamilUnicode.txt"), "UTF8"));
      BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("c:/test.txt"), "UTF8"));
      String line = null;
      while ((line = r.readLine()) != null) {
            char output = fromUnicodeEscaped(line);
            bw.write(output);
            //bw.newLine(); Why this?
      }
      bw.close();
      br.close();
} catch (Exception e) {
      e.printStackTrace();
}

..................
public static char fromUnicodeEscaped(String escaped) {
      escaped = escaped.toLowerCase();
      return (char) Integer.parseInt(escaped.substring(escaped.indexOf("\\u") + 2), 16);
}
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 1

Accepted Solution

by:
cpa199 earned 125 total points
ID: 12082884
Try this :

        String encoding = "UTF-8";
        String inputfilename = "c:\\test.txt";
        String outputfilename = "c:\\result.txt";

        BufferedReader reader = null;
        BufferedWriter writer = null;

        try {
            reader = new BufferedReader (
                         new InputStreamReader (
                             new FileInputStream (inputfilename), "ISO-8859-1"));
            writer = new BufferedWriter (
                         new OutputStreamWriter (
                             new FileOutputStream (outputfilename), encoding));

            String line = null;
            outer: while ((line = reader.readLine ()) != null) {
                int index = line.indexOf ("\\u");
                while (index > -1) {
                    writer.write (line.substring (0, index));
                    // if there are not enough chars left, an exception will be thrown
                    String temp = line.substring (index + 2, index + 6);
                    // exceptions could be thrown if convertion failed.
                    writer.write ((char) Integer.valueOf (temp, 16).intValue());
                    if (index + 6 > line.length ()) {
                        // end of line
                        writer.write ("\n");
                        continue outer;
                    }
                    line = line.substring (index + 6);
                    index = line.indexOf ("\\u");
                }
                writer.write (line);
                writer.write ("\n");
            }
        }
        catch (FileNotFoundException ex) {
           System.out.println ("File Not Found Exception:");
           ex.printStackTrace ();
        }
        catch (UnsupportedEncodingException ex) {
           System.out.println ("Unsupported Encoding Exception:");
           ex.printStackTrace ();
        }
        catch (IOException ex) {
           System.out.println ("IO Exception:");
           ex.printStackTrace ();
        }
        catch (Exception ex) {
           System.out.println ("General exception: ");
           ex.printStackTrace ();
        }
        finally {
            try {
                reader.close ();
                writer.close ();
            }
            catch (Exception ex) {
            }
        }
0
 
LVL 13

Expert Comment

by:Webstorm
ID: 12082933
Hi rajesh_bala,

Try this:

  FileInputStream fis=new FileInputStream("c:/tamilUnicode.txt");
  FileOutputStream fos=new FileOutputStream("c:/test.txt");

  byte[] buff=new byte[8192];
  int sz=0,state=0,code=0;

  while ((sz=fis.read(buff,0,buff.length))>=0)
  {
       for (int i=0;i<buff.length;i++)
       {
             byte b=buff[i];
             switch(state)
             {
                  case 0:
                       if (b=='\\') state=1;
                       else fos.write(b);
                       break;
                  case 1:
                       if (b=='u')
                       { code=0; state=2; }
                       else
                       {
                           fos.write('\\');
                           fos.write(b);
                           state=1;
                       }
                       break;
                  default:
                       code=(code<<4)|hexv(b);
                       if (++state>=6)
                       {
                           if ((code>=1)&&(code<128))
                               fos.write(code);
                           else if (code<0x800)
                           {
                               fos.write(0xC0|(code>>>6));
                               fos.write(0x80|(code&0x3F));
                           }
                           else
                           {
                               fos.write(0xE0|(code>>>12));
                               fos.write(0x80|((code>>>6)&0x3F));
                               fos.write(0x80|(code&0x3F));
                           }
                       }
             }
       }
  }

  fis.close();
  fos.close();


static int hexv(byte b)
{
     if (b<'A') return (int)b&0x0F;
     else return ((int)b&0x0F)+9;
}



0
 
LVL 13

Expert Comment

by:Webstorm
ID: 12082942
I forgot this line:

                      if (++state>=6)
                      {
                             state=0; // <---
0
 
LVL 1

Expert Comment

by:cpa199
ID: 12285679
Of course I would like the points/some of the points from this question as I am sure every other expert would ;-)
All I know is that my code worked when I tried it, but as long as whatever was put here was helpful the credit should go to the solution you would consider most appropriate.
Just writing this so you know that I am interested in it's final disposition ;-)

Carl
0
 
LVL 13

Expert Comment

by:Webstorm
ID: 12286990
I suggest to split the points : CEHJ, cpa199, Webstorm
0
 
LVL 10

Author Comment

by:rajesh_bala
ID: 12296644
Extremely sorry for not logging in for a long time after getting the comments. Thankx a lot guyz for helping out.
0
 
LVL 13

Expert Comment

by:Webstorm
ID: 12296921
rajesh_bala,

the others solutions also works.
0
 
LVL 10

Author Comment

by:rajesh_bala
ID: 12296952
Hi Venabili,
I tried the solution of of cpa199 first and alloted the points. Solutions of CEHJ and Webstorm are also working. Please spilt the points equally if possible. Sorry for the inconvinence.

~Rajesh.B
0

Featured Post

Want Experts Exchange at your fingertips?

With Experts Exchange’s latest app release, you can now experience our most recent features, updates, and the same community interface while on-the-go. Download our latest app release at the Android or Apple stores today!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.
Suggested Courses

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question