Link to home
Start Free TrialLog in
Avatar of Manish
ManishFlag for India

asked on

Converting UTF-8 to ISO-8859-1

Hi,
  I need good/best approach to convert string from  UTF-8 to ISO-8859-1. And ISO-8859-1 to UTF-8. I am reading UTF-8 String from xml.
karan.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

You don't need to convert String - only files. Do you mean a file?
open the file using utf8 and save it using ISO88591, which part exactly are you having problems with?
Avatar of Manish

ASKER

I am reading xml file , which has UTF-8 string.I want to store it in ISO .Lets say  
‘Citizens for NYC’ Board
this is string in xml .
I want to store it in db which having charset  US7ASCII.
I cannt change database charset.
As i mentioned, you don't need to convert String. They are already ported/portable between encodings
try specifying the charset in the jdbc connection string
Avatar of Manish

ASKER

If I dont change ,
   in jsp it look like ‘Citizens for NYC’

How to specify charset in jdbc connection string.?
>>If I dont change ,
   in jsp it look like ‘Citizens for NYC’

Those left and right quotes are not supported in ISO8859-1
> If I dont change ,
>   in jsp it look like ‘Citizens for NYC’

sounds like you'd be better off handling it when you *read* the data from the database
Avatar of Manish

ASKER

I am using following string method to read and covert it into utf-8
getBytes(ENCODING_ISO_8859_1),ENCODING_UTF8)
As i just mentioned, shifting encodings won't help - those quotes are not supported in ISO8859-1. You need to do

s = s.replaceAll("[\u2018\u2019]", "'");

See

http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
Avatar of Manish

ASKER

string.getBytes(ISO-8859-1),UTF-8)
Avatar of Manish

ASKER

>>s = s.replaceAll("[\u2018\u2019]", "'");

when should I use this method , while inserting or reading?
>>when should I use this method , while inserting or reading?

Before inserting - it will get rid of the unsupportable characters
Avatar of Manish

ASKER

Then how to find , character which is not supported , what should be equivalent to that char and store it in db.
  and is there any difference in output?
>>Then how to find , character which is not supported , what should be equivalent to that char and store it in db.

That's quite subjective - it would be down to you to find the ones that aren't supported and choose another you like better

>>and is there any difference in output?

Yes - they're quite different characters
Avatar of Manish

ASKER

So my steps should be,
read xml,
  read char by char , replace unsupported character with supported characters, store it in db,
  While reading , do I need to convert it in UTF to show on JSP?
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Manish

ASKER

So can you give one example, so that I can do it for all other characters.
If possible UTF-8 character list and ISO character list..
Well i gave you an example at http:Q_21867758.html#16788340

As i mentioned, the replacements are a matter of judgement. If you read that link i posted, for instance, you'll see that in the Unix world, people have the habit of using ` for a left quote - quite different to the replacement i suggested
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
:-)