UTF-8 to ANSI before inserting Chinese character to the database

Posted on 2004-11-03
Last Modified: 2013-12-12

My task is to crawl the html code from the chinese webpage "" and insert the html codes to column-"codes" of table-"testchi" of database-"buyingsearch".

The problem I face is that this homepage is encoded in UTF-8, but the mySQL database can't accept chinese in UTF-8 or "ugly code" would be shown. The database can only accept chinese words code in ANSI or Big5.

Is there any method for me to change the encoding from UTF-8 to Big5 in the PHP code before inserting $contents(the variable store the html code of the webpage) into the database?

Here is my code:

//fake the browser to read jsp
ini_set('user_agent','MSIE 4\.0b2;');

//connect to the servera & choose the database
$link_id = mysql_connect("localhost","root","");

print "Successfully connected.<br>";


if (!$contents  =  implode('', file($website)))
      echo htmlspecialchars($contents);
      mysql_query("insert into testchi(codes) values(\"$contents\");",$link_id);

Really wanna get help....

Question by:emily168
    LVL 3

    Accepted Solution

    Take a look at this page, with all the comments below the definition.


    Author Comment

    I have already tried to use the utf8_decode and utf8_encode function before, but the result is that all the chinese character of the string inserted to the mySQL database change to "?"
    LVL 29

    Assisted Solution

    How do you know that the inserted chars are "?". This might be true... or not.
    I presume you have looked at the resulting chars thru you browser and some program, either your php programor phpmyadmin.
    And there is the trick: the browser and/or the program might use the wrong codetable....
    One easy check you should do would be:
    1 - change the codepage of your browser (with IE, it would be something like "Display" / "Code" and them tray all plausible solutions.
    2 - you also need to check what is really happening, ie which byte values are REALLY stored into mySQL.

    All these codepages tricks are really hard to fully grasp, and you need to experiment a little (eg, displayng a string in 3 versions: uncoded, utf-8 coded, utf-8 decoded) because if it fails, it is difficult to know at which step this has occurred.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Highfive Gives IT Their Time Back

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
    Part of the Global Positioning System A geocode ( is the major subset of a GPS coordinate (, the other parts being the altitude and t…
    The viewer will learn how to dynamically set the form action using jQuery.
    The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

    779 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    10 Experts available now in Live!

    Get 1:1 Help Now