Solved

mb_convert_encoding and UTF-8 to GB2312 conversion

Posted on 2004-08-19
4
801 Views
Last Modified: 2012-06-21
I am currently developing a web application that displays all HTML pages in UTF-8 encoding. The application also contains an online form where users can enter the a message and send it out as an email in GB2312 format. However, if I change the online form's encoding to GB2312 so that the text input by the user is encoded with GB2312, the UTF-8 encoded text in the HTML form gets garbled.

Therefore, I decided to keep the online form encoded in UTF-8, and use iconv or mb_convert_encoding to convert UTF-8 encoded text into GB2312 (Simplified Chinese). It seems, however, neither iconv nor mb_convert do a 100% thorough job of converting the UTF-8 text. With iconv, certain special characters such as - or , do not get converted properly. And when iconv encounters a character it doesn't recognise, it tends to stop the conversion right there and then, so I only receive half of the converted text up to the point where the unrecognised character was found.

mb_convert_encoding also has problems recognising certain chinese characters and these characters get garbled during the conversion.

I'm new to all this utf-8 encoding stuff, so I was wondering if there is a way to provide mb_convert or iconv with the most up-to-date charsets in order to ensure all characters are translated correctly without being garbled. Actually, I'm not even sure if obtaining the latest charsets is the correct solution. Has anybody ever experienced this kind of problem with iconv or mb_convert_encoding? And if so, did you find a solution?

Many thanks for your help.
0
Comment
Question by:philippo123
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 29

Expert Comment

by:fibo
ID: 11925900
Hi,
These charsets things can be awful at times!
You'll probably need to check first WHICH problem you are experiencing...
1 - A simple test would be, when getting a web page with "wrong" characters displayed, to identify what is exactly happening. First, change the character code used by your web browser (easy with netscape and IE, mode difficult with opera) for the page (OR the frame if your page has frames). Experience several codes to see which character display fine and which display wrong: this will allow you to see what is happening. A stupid example I experienced was that the char code for my web page was UTF-8, that the chars coming from MySQL were displayed in UTF8, but that some chars I had entered in the php codes were NOT utf8. Of course this leads to several crazzy variations!
2 - You might also to be 100% sure ask for some chars strings to be displayed not only in char form but also in hex, so that you can manually check what is appening.
3 - If you use phpmyadmin to check values in MySQL, be aware that you have 2 frames and that iin some occasions you canNOT get the right code in the data (rightmost) frame.
4 - Maybe you have a live link at which we can brows and experiment?
0
 

Author Comment

by:philippo123
ID: 11970658
Thanks, but I've decided not to use the PHP converters to solve this problem anymore.  What I've been doing to work-around this conversion problem is to use a pop-up window which is encoded in GB2312 to allow the user to input data. This way, the text is entered directly into the system as GB2312, eliminating the need to convert it from UTF-8. Not a perfect solution, but it will have to do for now.

Thanks for your offer though
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12516642
PAQed, with points refunded (125)

modulo
Community Support Moderator
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
CSRF session and form tokens never match when using php/AJAX 4 57
JSON decode 5 46
Code planning methods/tools? 5 56
How can I split a variable 19 45
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question