mb_convert_encoding and UTF-8 to GB2312 conversion
Posted on 2004-08-19
I am currently developing a web application that displays all HTML pages in UTF-8 encoding. The application also contains an online form where users can enter the a message and send it out as an email in GB2312 format. However, if I change the online form's encoding to GB2312 so that the text input by the user is encoded with GB2312, the UTF-8 encoded text in the HTML form gets garbled.
Therefore, I decided to keep the online form encoded in UTF-8, and use iconv or mb_convert_encoding to convert UTF-8 encoded text into GB2312 (Simplified Chinese). It seems, however, neither iconv nor mb_convert do a 100% thorough job of converting the UTF-8 text. With iconv, certain special characters such as - or , do not get converted properly. And when iconv encounters a character it doesn't recognise, it tends to stop the conversion right there and then, so I only receive half of the converted text up to the point where the unrecognised character was found.
mb_convert_encoding also has problems recognising certain chinese characters and these characters get garbled during the conversion.
I'm new to all this utf-8 encoding stuff, so I was wondering if there is a way to provide mb_convert or iconv with the most up-to-date charsets in order to ensure all characters are translated correctly without being garbled. Actually, I'm not even sure if obtaining the latest charsets is the correct solution. Has anybody ever experienced this kind of problem with iconv or mb_convert_encoding? And if so, did you find a solution?
Many thanks for your help.