Solved

mb_convert_encoding and UTF-8 to GB2312 conversion

Posted on 2004-08-19
4
786 Views
Last Modified: 2012-06-21
I am currently developing a web application that displays all HTML pages in UTF-8 encoding. The application also contains an online form where users can enter the a message and send it out as an email in GB2312 format. However, if I change the online form's encoding to GB2312 so that the text input by the user is encoded with GB2312, the UTF-8 encoded text in the HTML form gets garbled.

Therefore, I decided to keep the online form encoded in UTF-8, and use iconv or mb_convert_encoding to convert UTF-8 encoded text into GB2312 (Simplified Chinese). It seems, however, neither iconv nor mb_convert do a 100% thorough job of converting the UTF-8 text. With iconv, certain special characters such as - or , do not get converted properly. And when iconv encounters a character it doesn't recognise, it tends to stop the conversion right there and then, so I only receive half of the converted text up to the point where the unrecognised character was found.

mb_convert_encoding also has problems recognising certain chinese characters and these characters get garbled during the conversion.

I'm new to all this utf-8 encoding stuff, so I was wondering if there is a way to provide mb_convert or iconv with the most up-to-date charsets in order to ensure all characters are translated correctly without being garbled. Actually, I'm not even sure if obtaining the latest charsets is the correct solution. Has anybody ever experienced this kind of problem with iconv or mb_convert_encoding? And if so, did you find a solution?

Many thanks for your help.
0
Comment
Question by:philippo123
4 Comments
 
LVL 29

Expert Comment

by:fibo
ID: 11925900
Hi,
These charsets things can be awful at times!
You'll probably need to check first WHICH problem you are experiencing...
1 - A simple test would be, when getting a web page with "wrong" characters displayed, to identify what is exactly happening. First, change the character code used by your web browser (easy with netscape and IE, mode difficult with opera) for the page (OR the frame if your page has frames). Experience several codes to see which character display fine and which display wrong: this will allow you to see what is happening. A stupid example I experienced was that the char code for my web page was UTF-8, that the chars coming from MySQL were displayed in UTF8, but that some chars I had entered in the php codes were NOT utf8. Of course this leads to several crazzy variations!
2 - You might also to be 100% sure ask for some chars strings to be displayed not only in char form but also in hex, so that you can manually check what is appening.
3 - If you use phpmyadmin to check values in MySQL, be aware that you have 2 frames and that iin some occasions you canNOT get the right code in the data (rightmost) frame.
4 - Maybe you have a live link at which we can brows and experiment?
0
 

Author Comment

by:philippo123
ID: 11970658
Thanks, but I've decided not to use the PHP converters to solve this problem anymore.  What I've been doing to work-around this conversion problem is to use a pop-up window which is encoded in GB2312 to allow the user to input data. This way, the text is entered directly into the system as GB2312, eliminating the need to convert it from UTF-8. Not a perfect solution, but it will have to do for now.

Thanks for your offer though
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12516642
PAQed, with points refunded (125)

modulo
Community Support Moderator
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Making API calls from hashed passwords 26 52
php construct 5 26
mysql update statement 3 22
showing loader for php/mysql/ajax live search 13 23
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question