openaccount1
asked on
Recode characters to UTF-8
Hi,
Our current pages are in ISO-8859-1 but when I looked at the pages they are not consistently in that character encoding. What I would like to do is convert our pages to UTF-8 compliant characters. regardless of what character encoding it was before since now our codes have been messed up and contains different encoding types.
What we want is a script that can automatically convert our pages to UTF-8 encoding (Not just the charset) so it will go through all pages and change all characters to UTF-8 compliant characters.
Our current pages are in ISO-8859-1 but when I looked at the pages they are not consistently in that character encoding. What I would like to do is convert our pages to UTF-8 compliant characters. regardless of what character encoding it was before since now our codes have been messed up and contains different encoding types.
What we want is a script that can automatically convert our pages to UTF-8 encoding (Not just the charset) so it will go through all pages and change all characters to UTF-8 compliant characters.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi v2Media,
That question is actually a poll question and anything that has to do with the actual recoding or recoding software should be posted here so that we have a separate answer for both and I can distribute points properly.
Also note that I am not a programmer so I may need something that is ready to use. Another thing is that we may have our html pages not consistently in ISO 8859-1 so we need something that can determine what type of encoding a file really is... e.g. it may be a mixed of ISO-8859-1 or Latin 1 or ASCII.
That question is actually a poll question and anything that has to do with the actual recoding or recoding software should be posted here so that we have a separate answer for both and I can distribute points properly.
Also note that I am not a programmer so I may need something that is ready to use. Another thing is that we may have our html pages not consistently in ISO 8859-1 so we need something that can determine what type of encoding a file really is... e.g. it may be a mixed of ISO-8859-1 or Latin 1 or ASCII.
Sorry - I can't extrapolate the distinction you're trying to make from your questions. I see 4 questions, all the same issue; just worded differently.
There is no program that will do this out of the box. All of the libraries I'm aware of require a source and destination charset. Also Latin1 is an informal name for ISO-8859-1- same thing.
If you fire off iconv at a directory with 8859-1 as source and utf-8 as destination, you should get a good result. ASCII is just a subset of 8859-1 I believe.
There is no program that will do this out of the box. All of the libraries I'm aware of require a source and destination charset. Also Latin1 is an informal name for ISO-8859-1- same thing.
If you fire off iconv at a directory with 8859-1 as source and utf-8 as destination, you should get a good result. ASCII is just a subset of 8859-1 I believe.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
We were able to obtain a code that can transform special characters to their entity form however we are now looking for something that can look at pages and then just change them to their UTF-8 compliant form and not just IS0-8859-1 form.
https://www.experts-exchange.com/questions/25273135/Tranform-Special-characters-to-entity-Name-recursively.html
https://www.experts-exchange.com/questions/25273135/Tranform-Special-characters-to-entity-Name-recursively.html
ASKER
@dbrunton,
tried the Charco but its not doing it correctly. If from ISO to UTF-8 it is addign the incorrect equivalent. Then from UTF-8 to ISO it is removing the special characters.
tried the Charco but its not doing it correctly. If from ISO to UTF-8 it is addign the incorrect equivalent. Then from UTF-8 to ISO it is removing the special characters.
ASKER
found a software where we can do this
http://en.wikipedia.org/wiki/Iconv