Solved

Java to convert bytes into String

Posted on 2011-02-22
12
552 Views
Last Modified: 2012-05-11
Hi,

My java application has read a Chinese web page into bytes[], and has detected the encoding of the web page.

bytes[] input = readfileintobyte(File file);
String enc = detectEncoding(input);

How can i convert the input into a "utf-8" encoded string?

I have tried the following but it doesn't work:

String newinput = new String(input, "utf-8");

Seems like that the enc variable should be used to convert to utf-8 string. But how?
0
Comment
Question by:wsyy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
12 Comments
 
LVL 92

Expert Comment

by:objects
ID: 34958606
String newinput = new String(input, enc);
0
 
LVL 92

Expert Comment

by:objects
ID: 34958617
> How can i convert the input into a "utf-8" encoded string?

theres no such thing in java, there is only a utf8 encoded byte array

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
0
 

Author Comment

by:wsyy
ID: 34958642
objects: the first solution works.

i just wonder why "utf-8" is not used in the solution?

in addition, what if i want to convert newinput into the original encoding enc?
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 92

Expert Comment

by:objects
ID: 34958672
explained that in the 2nd comment.
java strings do not really have an encoding. its the byte array that has a specific encoding.
ie. the byte array contains the string encoded with a specific charset
0
 
LVL 92

Expert Comment

by:objects
ID: 34958679
> in addition, what if i want to convert newinput into the original encoding enc?


byte[] original = newinput.getBytes(enc);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34960418
>>
objects: the first solution works.

i just wonder why "utf-8" is not used in the solution?
>>

Because utf-8 is not being used as the encoding. The encoding being used is the original encoding.
It's not clear what your goal is, but if it's to get a byte array with utf-8 encoding, then you need to transcode
0
 

Author Comment

by:wsyy
ID: 34961997
CEHJ, i do want to save the contents in utf-8. will the following code be ok?

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");
String utf8_newinput = new String(utf8, "UTF8");


If the code is good, can i make it simpler?

0
 

Author Comment

by:wsyy
ID: 34962129
I just check, and my above code doesn't work.

CEHJ, how can I do the transcode?
0
 
LVL 92

Expert Comment

by:objects
ID: 34964497
>, i do want to save the contents in utf-8. will the following code be ok?

No, as I already explained above.
If you want to save the contents in utf8 then you need to save the byte array, *not* a string

String newinput = new String(input, enc);
byte[] utf8 = newinput.getBytes("UTF8");

// save utf8 encoded byte array to a file
0
 

Author Comment

by:wsyy
ID: 34965697
objects, when i save the utf8 encoded byte array to a file, i don't need to specify any encoding, right?

do you have a quick example about saving utf8 byte array to a file?

in addition, if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.

thanks
0
 
LVL 92

Accepted Solution

by:
objects earned 125 total points
ID: 34965758
> objects, when i save the utf8 encoded byte array to a file, i don't need to specify any encoding, right?

no, its just a byte array (its the content of the byte array that are already encoded)

> do you have a quick example about saving utf8 byte array to a file?

FilOutputStream out = new FileOutputStream(out);
out.write(utf8);
out.close();

>  if i want to do something on the string which is based on the original byte array (encoded in enc), how can I do so that the chinese characters inside the byte array can be properly handled.

theres nothing you need to do
0
 

Author Closing Comment

by:wsyy
ID: 34965861
Excellent!
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
The viewer will learn how to implement Singleton Design Pattern in Java.
Suggested Courses

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question