Solved

mb_convert_encoding and UTF-8 to GB2312 conversion

Posted on 2004-08-19
4
775 Views
Last Modified: 2012-06-21
I am currently developing a web application that displays all HTML pages in UTF-8 encoding. The application also contains an online form where users can enter the a message and send it out as an email in GB2312 format. However, if I change the online form's encoding to GB2312 so that the text input by the user is encoded with GB2312, the UTF-8 encoded text in the HTML form gets garbled.

Therefore, I decided to keep the online form encoded in UTF-8, and use iconv or mb_convert_encoding to convert UTF-8 encoded text into GB2312 (Simplified Chinese). It seems, however, neither iconv nor mb_convert do a 100% thorough job of converting the UTF-8 text. With iconv, certain special characters such as - or , do not get converted properly. And when iconv encounters a character it doesn't recognise, it tends to stop the conversion right there and then, so I only receive half of the converted text up to the point where the unrecognised character was found.

mb_convert_encoding also has problems recognising certain chinese characters and these characters get garbled during the conversion.

I'm new to all this utf-8 encoding stuff, so I was wondering if there is a way to provide mb_convert or iconv with the most up-to-date charsets in order to ensure all characters are translated correctly without being garbled. Actually, I'm not even sure if obtaining the latest charsets is the correct solution. Has anybody ever experienced this kind of problem with iconv or mb_convert_encoding? And if so, did you find a solution?

Many thanks for your help.
0
Comment
Question by:philippo123
4 Comments
 
LVL 29

Expert Comment

by:fibo
Comment Utility
Hi,
These charsets things can be awful at times!
You'll probably need to check first WHICH problem you are experiencing...
1 - A simple test would be, when getting a web page with "wrong" characters displayed, to identify what is exactly happening. First, change the character code used by your web browser (easy with netscape and IE, mode difficult with opera) for the page (OR the frame if your page has frames). Experience several codes to see which character display fine and which display wrong: this will allow you to see what is happening. A stupid example I experienced was that the char code for my web page was UTF-8, that the chars coming from MySQL were displayed in UTF8, but that some chars I had entered in the php codes were NOT utf8. Of course this leads to several crazzy variations!
2 - You might also to be 100% sure ask for some chars strings to be displayed not only in char form but also in hex, so that you can manually check what is appening.
3 - If you use phpmyadmin to check values in MySQL, be aware that you have 2 frames and that iin some occasions you canNOT get the right code in the data (rightmost) frame.
4 - Maybe you have a live link at which we can brows and experiment?
0
 

Author Comment

by:philippo123
Comment Utility
Thanks, but I've decided not to use the PHP converters to solve this problem anymore.  What I've been doing to work-around this conversion problem is to use a pop-up window which is encoded in GB2312 to allow the user to input data. This way, the text is entered directly into the system as GB2312, eliminating the need to convert it from UTF-8. Not a perfect solution, but it will have to do for now.

Thanks for your offer though
0
 

Accepted Solution

by:
modulo earned 0 total points
Comment Utility
PAQed, with points refunded (125)

modulo
Community Support Moderator
0

Featured Post

Easy Project Management (No User Manual Required)

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
XSS cleaning when using ajax & json encode 5 38
Time difference 10 33
Phone Dialer 5 35
Wordpress Body Class 5 7
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now