Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 845
  • Last Modified:

mb_convert_encoding and UTF-8 to GB2312 conversion

I am currently developing a web application that displays all HTML pages in UTF-8 encoding. The application also contains an online form where users can enter the a message and send it out as an email in GB2312 format. However, if I change the online form's encoding to GB2312 so that the text input by the user is encoded with GB2312, the UTF-8 encoded text in the HTML form gets garbled.

Therefore, I decided to keep the online form encoded in UTF-8, and use iconv or mb_convert_encoding to convert UTF-8 encoded text into GB2312 (Simplified Chinese). It seems, however, neither iconv nor mb_convert do a 100% thorough job of converting the UTF-8 text. With iconv, certain special characters such as - or , do not get converted properly. And when iconv encounters a character it doesn't recognise, it tends to stop the conversion right there and then, so I only receive half of the converted text up to the point where the unrecognised character was found.

mb_convert_encoding also has problems recognising certain chinese characters and these characters get garbled during the conversion.

I'm new to all this utf-8 encoding stuff, so I was wondering if there is a way to provide mb_convert or iconv with the most up-to-date charsets in order to ensure all characters are translated correctly without being garbled. Actually, I'm not even sure if obtaining the latest charsets is the correct solution. Has anybody ever experienced this kind of problem with iconv or mb_convert_encoding? And if so, did you find a solution?

Many thanks for your help.
0
philippo123
Asked:
philippo123
1 Solution
 
fiboCommented:
Hi,
These charsets things can be awful at times!
You'll probably need to check first WHICH problem you are experiencing...
1 - A simple test would be, when getting a web page with "wrong" characters displayed, to identify what is exactly happening. First, change the character code used by your web browser (easy with netscape and IE, mode difficult with opera) for the page (OR the frame if your page has frames). Experience several codes to see which character display fine and which display wrong: this will allow you to see what is happening. A stupid example I experienced was that the char code for my web page was UTF-8, that the chars coming from MySQL were displayed in UTF8, but that some chars I had entered in the php codes were NOT utf8. Of course this leads to several crazzy variations!
2 - You might also to be 100% sure ask for some chars strings to be displayed not only in char form but also in hex, so that you can manually check what is appening.
3 - If you use phpmyadmin to check values in MySQL, be aware that you have 2 frames and that iin some occasions you canNOT get the right code in the data (rightmost) frame.
4 - Maybe you have a live link at which we can brows and experiment?
0
 
philippo123Author Commented:
Thanks, but I've decided not to use the PHP converters to solve this problem anymore.  What I've been doing to work-around this conversion problem is to use a pop-up window which is encoded in GB2312 to allow the user to input data. This way, the text is entered directly into the system as GB2312, eliminating the need to convert it from UTF-8. Not a perfect solution, but it will have to do for now.

Thanks for your offer though
0
 
moduloCommented:
PAQed, with points refunded (125)

modulo
Community Support Moderator
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now