MySql DB Conversion to UTF8

Posted on 2007-08-04
Last Modified: 2013-11-05
I am an experienced Java / database developer, but I have never gone through a conversion of a database from latin1 to UTF8.   I am using MySQL 5.0.27.  I am especially concerned because I am working with a live US site that I need to convert to be able to handle also Chinese and other languages.   The US site can't break.

The main piece for this question is how do I feel comfortable about the database changes?

So far what I have done so far is changed my my.cnf file to have the following information only:

init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET CHARACTER_SET utf8'
init_connect='SET NAMES utf8'


Now when I do:  show variables like "%character%";show variables like "%collation%";

I am getting the results of:

| Variable_name            | Value                    |
| character_set_client     | utf8                     |
| character_set_connection | utf8                     |
| character_set_database   | latin1                   |
| character_set_filesystem | binary                   |
| character_set_results    | utf8                     |
| character_set_server     | utf8                     |
| character_set_system     | utf8                     |
| character_sets_dir       | C:\mysql\share\charsets\ |
8 rows in set (0.06 sec)

| Variable_name        | Value             |
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | utf8_general_ci   |
3 rows in set (0.00 sec)

Note that I am still seeing the two configuration items still being latin1 oriented.   Do I need to dump the database and recreated it?  Do I need to do a data conversion first?   If so what might that be?   Would ALTER database be enough?   Do I need to put UTF8 on all table creations in the future and database creations in the future?  Do I need to do something to make sure dumps work correctly in UTF8?

I realize from my Java Object Relational Bridge connection I need to also set the CONNECT=UTF8 or some such.   But is that all?

I am also using Lucene for search engine capabilities.  I assume Chinese with multi-byte might not work the same, but don't understand the full scope of that.  I think I might need to pull in a multi-byte parser.

If only one thing, I want to make sure I am using the database part right here.

I realize this is a difficult question, but thank you very much in advance for any help.  Until I feel comfortable, I don't make much useful progress and I don't want to destroy our production environment.

Question by:johnike
    LVL 9

    Accepted Solution

    You actually have to convert the database; the collation_database and character_set_database variables are determined at runtime, as to how the database was saved.  These should be able to sort you out:

    ALTER DATABASE `myDb` COLLATE utf8_unicode_ci;

    More info:

    Author Comment

    This has been very helpful.

    Since I am just latin1 at the moment the ALTER databases should be  enough to have data converted to UTF8?

    I was hoping to be able to change to UTF8 to support Asian character sets (esp Chinese).  Upon reading your third link, I am now not sure that will work.  I am thinking now that I will have to have a separate database that is the cp932 Character Set which is not what I was hoping to have to do to support the Asian set.  I was hoping to be able to have my site be both English and Asian and not have two databases.  I need to do more research.  I am hoping I am wrong and UTF8 will satisfy both.

    If this is the case, then my Lucene Java code might have to have two different parsers as well for the search engine piece.

    I sure wish there was one site that really gave more comprehensive information on what you need to internationalize (English and Asian) a database driven web site.  I have not found a good one yet.

    Since the site is in production, I am very uncomfortable about corrupting the site's data.

    LVL 3

    Assisted Solution

    hi johnike,

    try this website:

    Featured Post

    How to improve team productivity

    Quip adds documents, spreadsheets, and tasklists to your Slack experience
    - Elevate ideas to Quip docs
    - Share Quip docs in Slack
    - Get notified of changes to your docs
    - Available on iOS/Android/Desktop/Web
    - Online/Offline

    Join & Write a Comment

    This article describes some very basic things about SQL Server filegroups.
    APEX (Application Express) is used to develop a web application from Oracle. SQL Workshop is one of the tools that comes with Oracle APEX to query or modify the database objects or to make any changes to the structure.
    Video by: Steve
    Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
    Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now