UTF-8 to ANSI before inserting Chinese character to the database


My task is to crawl the html code from the chinese webpage "http://emart.ttyy.net/big5.phtml" and insert the html codes to column-"codes" of table-"testchi" of database-"buyingsearch".

The problem I face is that this homepage is encoded in UTF-8, but the mySQL database can't accept chinese in UTF-8 or "ugly code" would be shown. The database can only accept chinese words code in ANSI or Big5.

Is there any method for me to change the encoding from UTF-8 to Big5 in the PHP code before inserting $contents(the variable store the html code of the webpage) into the database?

Here is my code:

//fake the browser to read jsp
ini_set('user_agent','MSIE 4\.0b2;');

//connect to the servera & choose the database
$link_id = mysql_connect("localhost","root","");

print "Successfully connected.<br>";


if (!$contents  =  implode('', file($website)))
      echo htmlspecialchars($contents);
      mysql_query("insert into testchi(codes) values(\"$contents\");",$link_id);

Really wanna get help....

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Take a look at this page, with all the comments below the definition.


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
emily168Author Commented:
I have already tried to use the utf8_decode and utf8_encode function before, but the result is that all the chinese character of the string inserted to the mySQL database change to "?"
Bernard S.CTOCommented:
How do you know that the inserted chars are "?". This might be true... or not.
I presume you have looked at the resulting chars thru you browser and some program, either your php programor phpmyadmin.
And there is the trick: the browser and/or the program might use the wrong codetable....
One easy check you should do would be:
1 - change the codepage of your browser (with IE, it would be something like "Display" / "Code" and them tray all plausible solutions.
2 - you also need to check what is really happening, ie which byte values are REALLY stored into mySQL.

All these codepages tricks are really hard to fully grasp, and you need to experiment a little (eg, displayng a string in 3 versions: uncoded, utf-8 coded, utf-8 decoded) because if it fails, it is difficult to know at which step this has occurred.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.