Solved

Java to convert bytes into String

Posted on 2011-02-22
12
545 Views
Last Modified: 2012-05-11
Hi,

My java application has read a Chinese web page into bytes[], and has detected the encoding of the web page.

bytes[] input = readfileintobyte(File file);
String enc = detectEncoding(input);

How can i convert the input into a "utf-8" encoded string?

I have tried the following but it doesn't work:

String newinput = new String(input, "utf-8");

Seems like that the enc variable should be used to convert to utf-8 string. But how?
0
Comment
Question by:wsyy
  • 6
  • 5
12 Comments
 
LVL 92

Expert Comment

by:objects
ID: 34958606
String newinput = new String(input, enc);
0
 
LVL 92

Expert Comment

by:objects
ID: 34958617
> How can i convert the input into a "utf-8" encoded string?

theres no such thing in java, there is only a utf8 encoded byte array

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
0
 

Author Comment

by:wsyy
ID: 34958642
objects: the first solution works.

i just wonder why "utf-8" is not used in the solution?

in addition, what if i want to convert newinput into the original encoding enc?
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 92

Expert Comment

by:objects
ID: 34958672
explained that in the 2nd comment.
java strings do not really have an encoding. its the byte array that has a specific encoding.
ie. the byte array contains the string encoded with a specific charset
0
 
LVL 92

Expert Comment

by:objects
ID: 34958679
> in addition, what if i want to convert newinput into the original encoding enc?


byte[] original = newinput.getBytes(enc);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34960418
>>
objects: the first solution works.

i just wonder why "utf-8" is not used in the solution?
>>

Because utf-8 is not being used as the encoding. The encoding being used is the original encoding.
It's not clear what your goal is, but if it's to get a byte array with utf-8 encoding, then you need to transcode
0
 

Author Comment

by:wsyy
ID: 34961997
CEHJ, i do want to save the contents in utf-8. will the following code be ok?

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
String utf8_newinput = new String(utf8, "UTF8");


If the code is good, can i make it simpler?

0
 

Author Comment

by:wsyy
ID: 34962129
I just check, and my above code doesn't work.

CEHJ, how can I do the transcode?
0
 
LVL 92

Expert Comment

by:objects
ID: 34964497
>, i do want to save the contents in utf-8. will the following code be ok?

No, as I already explained above.
If you want to save the contents in utf8 then you need to save the byte array, *not* a string

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");

// save utf8 encoded byte array to a file
0
 

Author Comment

by:wsyy
ID: 34965697
objects, when i save the utf8 encoded byte array to a file, i don't need to specify any encoding, right?

do you have a quick example about saving utf8 byte array to a file?

in addition, if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.

thanks
0
 
LVL 92

Accepted Solution

by:
objects earned 125 total points
ID: 34965758
> objects, when i save the utf8 encoded byte array to a file, i don't need to specify any encoding, right?

no, its just a byte array (its the content of the byte array that are already encoded)

> do you have a quick example about saving utf8 byte array to a file?

FilOutputStream out = new FileOutputStream(out);
out.write(utf8);
out.close();

>  if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.

theres nothing you need to do
0
 

Author Closing Comment

by:wsyy
ID: 34965861
Excellent!
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
diffSum example 4 36
Arrays.asList  VS  ArrayList 4 58
windows explorer path to command prompt 5 44
tomcat not starting 6 45
For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question