Link to home
Start Free TrialLog in
Avatar of johnike
johnikeFlag for United States of America

asked on

MySql data conversion from latin1 to utf8

I am trying to convert my MySql 5.0.* application data from latin1 to UTF8.  In doing so, my European words with special characters are getting truncated upon uploading.

My database settings were originally:
character_set_client    latin1
character_set_connection        latin1
character_set_database  latin1
character_set_filesystem        binary
character_set_results  latin1
character_set_server    latin1
character_set_system    utf8
character_sets_dir      /usr/share/mysql/charsets/
Variable_name  Value
collation_connection    latin1_swedish_ci
collation_database      latin1_swedish_ci
collation_server        latin1_swedish_ci


Based on database config file changes, they are now:
character_set_client    utf8
character_set_connection        utf8
character_set_database  utf8
character_set_filesystem        binary
character_set_results  utf8
character_set_server    utf8
character_set_system    utf8
character_sets_dir      C:\\apps\\mysql\\share\\charsets\\
Variable_name  Value
collation_connection    utf8_general_ci
collation_database      utf8_general_ci
collation_server        utf8_unicode_ci

This seems correct.

My code does the following:
mysqldump ${DATABASE} ${ID} ${PASSWORD} --default-character-set=latin1 |
            sed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/g" | \
            sed -e "s/SET NAMES latin1/SET NAMES utf8/g"  > datafile

I drop the old database and recreated it as UTF8.

Then I do:
cat datafile | mysql --default-character-set=utf8 ${ID} ${PASSWORD} ${DATABASE}

Once this is done the data after special characters in fields is truncated in the new database.  datafile holds data that looks like it still has the special characters.

Ideas on what I am doing wrong?

Thanks!


Avatar of Nellios
Nellios
Flag of Greece image

I am not very familiar with sed (damn I should pay more attention when reading Kernigham) but If I understand correctly you are replacing the Default Character Set from latin1 to utf8 in your dump file.

I think that this is not enough and I will tell you why:

mysql >= 5.0 also use collation when it comes to VARCHAR declarations. So I guess you also need to change the collation to utf8_bin or utf8_ci depending on your needs.
ASKER CERTIFIED SOLUTION
Avatar of Tomas Helgi Johannsson
Tomas Helgi Johannsson
Flag of Iceland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of johnike

ASKER

I'm trying things based on your suggestions.  Do you think this could happen due to transfering the data from UNIX to windows?
If you are moving data from *nix to Windows then that could be the case if you
generate the file in *nix and open it in wondows.
You could alwasys use SQLYog , MySQLAdmint tool or Toad for MySQL to extract/import or replciate the data
beteween databases. I have used it to transfer data from Win to Linux where I have Latin1 charset on Windows
and UTF8 on Linux with great success.

Regards,
   Tomas Helgi
MySQLAdmint should be MySQLAdministrator :)
Avatar of johnike

ASKER

I don't quite get it, but I think your answer worked  Tomas Helgi.   It does not really make sense to me, but I believe I am seeing a UTF8 result by not specifying the --default-character-set=latin1 on the dump .

In the database I now see a sequence that looks a bit odd (instead of it being a tilda for example), but when I have it come back out to my screens, it is showing up correctly as a tilda.

I need to do a bit of looking into whether or not what I am seeing is correct in the database, but thank you for such a simple fix.

I will follow up on this later.
Avatar of johnike

ASKER

Seems like it worked.  I have yet to put it in production, but I figured I would give you the points now.  Thank you very much.