Java to convert bytes into String

Hi,

My java application has read a Chinese web page into bytes[], and has detected the encoding of the web page.

bytes[] input = readfileintobyte(File file);
String enc = detectEncoding(input);

How can i convert the input into a "utf-8" encoded string?

I have tried the following but it doesn't work:

String newinput = new String(input, "utf-8");

Seems like that the enc variable should be used to convert to utf-8 string. But how?
wsyyAsked:
Who is Participating?
 
objectsConnect With a Mentor Commented:
> objects, when i save the utf8 encoded byte array to a file, i don't need to specify any encoding, right?

no, its just a byte array (its the content of the byte array that are already encoded)

> do you have a quick example about saving utf8 byte array to a file?

FilOutputStream out = new FileOutputStream(out);
out.write(utf8);
out.close();

>  if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.

theres nothing you need to do
0
 
objectsCommented:
String newinput = new String(input, enc);
0
 
objectsCommented:
> How can i convert the input into a "utf-8" encoded string?

theres no such thing in java, there is only a utf8 encoded byte array

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
wsyyAuthor Commented:
objects: the first solution works.

i just wonder why "utf-8" is not used in the solution?

in addition, what if i want to convert newinput into the original encoding enc?
0
 
objectsCommented:
explained that in the 2nd comment.
java strings do not really have an encoding. its the byte array that has a specific encoding.
ie. the byte array contains the string encoded with a specific charset
0
 
objectsCommented:
> in addition, what if i want to convert newinput into the original encoding enc?


byte[] original = newinput.getBytes(enc);
0
 
CEHJCommented:
>>
objects: the first solution works.

i just wonder why "utf-8" is not used in the solution?
>>

Because utf-8 is not being used as the encoding. The encoding being used is the original encoding.
It's not clear what your goal is, but if it's to get a byte array with utf-8 encoding, then you need to transcode
0
 
wsyyAuthor Commented:
CEHJ, i do want to save the contents in utf-8. will the following code be ok?

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
String utf8_newinput = new String(utf8, "UTF8");


If the code is good, can i make it simpler?

0
 
wsyyAuthor Commented:
I just check, and my above code doesn't work.

CEHJ, how can I do the transcode?
0
 
objectsCommented:
>, i do want to save the contents in utf-8. will the following code be ok?

No, as I already explained above.
If you want to save the contents in utf8 then you need to save the byte array, *not* a string

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");

// save utf8 encoded byte array to a file
0
 
wsyyAuthor Commented:
objects, when i save the utf8 encoded byte array to a file, i don't need to specify any encoding, right?

do you have a quick example about saving utf8 byte array to a file?

in addition, if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.

thanks
0
 
wsyyAuthor Commented:
Excellent!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.