We help IT Professionals succeed at work.
Get Started

Getting web content & Charsets in Java (500pts!)

il68 asked
Last Modified: 2010-03-31
I am using built in java libraries as well as Apche's httpClient to get content of various web pages.  I need to put them in a string, parse them, manipulate a few things, and then write them to a file.

A lot of these pages are foreign and use various charsets (chinese, cirilic, etc). What is the best (end easiest) way to handle this so that in the end everything is displayed correctly when the writen file is opened with a browser?

Is there a one charset I can pass in when converting bytes to string, and later when converting the string into file bytes?

Or do I need to to somehow read in the charset of the page and use it throughout? If so, must I use it only for content and still use ascii for tags?

Or is it something completelly different?

I am really not a charset expert and it appears hard to find one. I have to get this to work really soon. I talked to a few people already but noone could completelly explain it to me. So, 500pts!  
Watch Question
This problem has been solved!
Unlock 1 Answer and 2 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE