UTF-8 to ANSI before inserting Chinese character to the database

Posted on 2004-11-03
Medium Priority
Last Modified: 2013-12-12

My task is to crawl the html code from the chinese webpage "http://emart.ttyy.net/big5.phtml" and insert the html codes to column-"codes" of table-"testchi" of database-"buyingsearch".

The problem I face is that this homepage is encoded in UTF-8, but the mySQL database can't accept chinese in UTF-8 or "ugly code" would be shown. The database can only accept chinese words code in ANSI or Big5.

Is there any method for me to change the encoding from UTF-8 to Big5 in the PHP code before inserting $contents(the variable store the html code of the webpage) into the database?

Here is my code:

//fake the browser to read jsp
ini_set('user_agent','MSIE 4\.0b2;');

//connect to the servera & choose the database
$link_id = mysql_connect("localhost","root","");

print "Successfully connected.<br>";


if (!$contents  =  implode('', file($website)))
      echo htmlspecialchars($contents);
      mysql_query("insert into testchi(codes) values(\"$contents\");",$link_id);

Really wanna get help....

Question by:emily168

Accepted Solution

Sasho earned 400 total points
ID: 12494668
Take a look at this page, with all the comments below the definition.


Author Comment

ID: 12494958
I have already tried to use the utf8_decode and utf8_encode function before, but the result is that all the chinese character of the string inserted to the mySQL database change to "?"
LVL 29

Assisted Solution

fibo earned 400 total points
ID: 12538042
How do you know that the inserted chars are "?". This might be true... or not.
I presume you have looked at the resulting chars thru you browser and some program, either your php programor phpmyadmin.
And there is the trick: the browser and/or the program might use the wrong codetable....
One easy check you should do would be:
1 - change the codepage of your browser (with IE, it would be something like "Display" / "Code" and them tray all plausible solutions.
2 - you also need to check what is really happening, ie which byte values are REALLY stored into mySQL.

All these codepages tricks are really hard to fully grasp, and you need to experiment a little (eg, displayng a string in 3 versions: uncoded, utf-8 coded, utf-8 decoded) because if it fails, it is difficult to know at which step this has occurred.

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses
Course of the Month13 days, 20 hours left to enroll

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question