Link to home
Start Free TrialLog in
Avatar of openaccount1
openaccount1

asked on

Recode characters to UTF-8

Hi,

Our current pages are in ISO-8859-1 but when I looked at the pages they are not consistently in that character encoding. What I would like to do is convert our pages to UTF-8 compliant characters. regardless of what character encoding it was before since now our codes have been messed up and contains different encoding types.

What we want is a script that can automatically convert our pages to UTF-8 encoding (Not just the charset) so it will go through all pages and change all characters to UTF-8 compliant characters.
Avatar of dbrunton
dbrunton
Flag of New Zealand image

ASKER CERTIFIED SOLUTION
Avatar of v2Media
v2Media
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of openaccount1
openaccount1

ASKER

Hi v2Media,

That question is actually a poll question and anything that has to do with the actual recoding or recoding software should be posted here so that we have a separate answer for both and I can distribute points properly.

Also note that I am not a programmer so I may need something that is ready to use. Another thing is that we may have our html pages not consistently in ISO 8859-1 so we need something that can determine what type of encoding a file really is... e.g. it may be a mixed of ISO-8859-1 or Latin 1 or ASCII.
Sorry - I can't extrapolate the distinction you're trying to make from your questions. I see 4 questions, all the same issue; just worded differently.
There is no program that will do this out of the box. All of the libraries I'm aware of require a source and destination charset. Also Latin1 is an informal name for ISO-8859-1- same thing.

If you fire off iconv at a directory with 8859-1 as source and utf-8 as destination, you should get a good result. ASCII is just a subset of 8859-1 I believe.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
We were able to obtain a code that can transform special characters to their entity form however we are now looking for something that can look at pages and then just change them to their UTF-8 compliant form and not just IS0-8859-1 form.
https://www.experts-exchange.com/questions/25273135/Tranform-Special-characters-to-entity-Name-recursively.html
@dbrunton,

tried the Charco but its not doing it correctly. If from ISO to UTF-8 it is addign the incorrect equivalent. Then from UTF-8 to ISO it is removing the special characters.
found a software where we can do this