wsyy
asked on
Java to convert bytes into String
Hi,
My java application has read a Chinese web page into bytes[], and has detected the encoding of the web page.
bytes[] input = readfileintobyte(File file);
String enc = detectEncoding(input);
How can i convert the input into a "utf-8" encoded string?
I have tried the following but it doesn't work:
String newinput = new String(input, "utf-8");
Seems like that the enc variable should be used to convert to utf-8 string. But how?
My java application has read a Chinese web page into bytes[], and has detected the encoding of the web page.
bytes[] input = readfileintobyte(File file);
String enc = detectEncoding(input);
How can i convert the input into a "utf-8" encoded string?
I have tried the following but it doesn't work:
String newinput = new String(input, "utf-8");
Seems like that the enc variable should be used to convert to utf-8 string. But how?
String newinput = new String(input, enc);
> How can i convert the input into a "utf-8" encoded string?
theres no such thing in java, there is only a utf8 encoded byte array
String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
theres no such thing in java, there is only a utf8 encoded byte array
String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
ASKER
objects: the first solution works.
i just wonder why "utf-8" is not used in the solution?
in addition, what if i want to convert newinput into the original encoding enc?
i just wonder why "utf-8" is not used in the solution?
in addition, what if i want to convert newinput into the original encoding enc?
explained that in the 2nd comment.
java strings do not really have an encoding. its the byte array that has a specific encoding.
ie. the byte array contains the string encoded with a specific charset
java strings do not really have an encoding. its the byte array that has a specific encoding.
ie. the byte array contains the string encoded with a specific charset
> in addition, what if i want to convert newinput into the original encoding enc?
byte[] original = newinput.getBytes(enc);
byte[] original = newinput.getBytes(enc);
>>
objects: the first solution works.
i just wonder why "utf-8" is not used in the solution?
>>
Because utf-8 is not being used as the encoding. The encoding being used is the original encoding.
It's not clear what your goal is, but if it's to get a byte array with utf-8 encoding, then you need to transcode
objects: the first solution works.
i just wonder why "utf-8" is not used in the solution?
>>
Because utf-8 is not being used as the encoding. The encoding being used is the original encoding.
It's not clear what your goal is, but if it's to get a byte array with utf-8 encoding, then you need to transcode
ASKER
CEHJ, i do want to save the contents in utf-8. will the following code be ok?
String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
String utf8_newinput = new String(utf8, "UTF8");
If the code is good, can i make it simpler?
String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
String utf8_newinput = new String(utf8, "UTF8");
If the code is good, can i make it simpler?
ASKER
I just check, and my above code doesn't work.
CEHJ, how can I do the transcode?
CEHJ, how can I do the transcode?
>, i do want to save the contents in utf-8. will the following code be ok?
No, as I already explained above.
If you want to save the contents in utf8 then you need to save the byte array, *not* a string
String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
// save utf8 encoded byte array to a file
No, as I already explained above.
If you want to save the contents in utf8 then you need to save the byte array, *not* a string
String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
// save utf8 encoded byte array to a file
ASKER
objects, when i save the utf8 encoded byte array to a file, i don't need to specify any encoding, right?
do you have a quick example about saving utf8 byte array to a file?
in addition, if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.
thanks
do you have a quick example about saving utf8 byte array to a file?
in addition, if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.
thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Excellent!