[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3096
  • Last Modified:

MySql data conversion from latin1 to utf8

I am trying to convert my MySql 5.0.* application data from latin1 to UTF8.  In doing so, my European words with special characters are getting truncated upon uploading.

My database settings were originally:
character_set_client    latin1
character_set_connection        latin1
character_set_database  latin1
character_set_filesystem        binary
character_set_results  latin1
character_set_server    latin1
character_set_system    utf8
character_sets_dir      /usr/share/mysql/charsets/
Variable_name  Value
collation_connection    latin1_swedish_ci
collation_database      latin1_swedish_ci
collation_server        latin1_swedish_ci


Based on database config file changes, they are now:
character_set_client    utf8
character_set_connection        utf8
character_set_database  utf8
character_set_filesystem        binary
character_set_results  utf8
character_set_server    utf8
character_set_system    utf8
character_sets_dir      C:\\apps\\mysql\\share\\charsets\\
Variable_name  Value
collation_connection    utf8_general_ci
collation_database      utf8_general_ci
collation_server        utf8_unicode_ci

This seems correct.

My code does the following:
mysqldump ${DATABASE} ${ID} ${PASSWORD} --default-character-set=latin1 |
            sed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/g" | \
            sed -e "s/SET NAMES latin1/SET NAMES utf8/g"  > datafile

I drop the old database and recreated it as UTF8.

Then I do:
cat datafile | mysql --default-character-set=utf8 ${ID} ${PASSWORD} ${DATABASE}

Once this is done the data after special characters in fields is truncated in the new database.  datafile holds data that looks like it still has the special characters.

Ideas on what I am doing wrong?

Thanks!


0
johnike
Asked:
johnike
  • 3
  • 3
1 Solution
 
NelliosCommented:
I am not very familiar with sed (damn I should pay more attention when reading Kernigham) but If I understand correctly you are replacing the Default Character Set from latin1 to utf8 in your dump file.

I think that this is not enough and I will tell you why:

mysql >= 5.0 also use collation when it comes to VARCHAR declarations. So I guess you also need to change the collation to utf8_bin or utf8_ci depending on your needs.
0
 
Tomas Helgi JohannssonCommented:
  Hi!

When issueing the mysqldump command then it says in the manual
http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html
that " If no character set is specified, mysqldump uses utf8, and earlier versions use latin1."
So loose the "--default-character-set=latin1" option and just issue
mysqldump ${DATABASE} ${ID} ${PASSWORD} > datafile
Import that file and you should have the data in UTF8 format.

Regards,
   Tomas Helgi
0
 
johnikeAuthor Commented:
I'm trying things based on your suggestions.  Do you think this could happen due to transfering the data from UNIX to windows?
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
Tomas Helgi JohannssonCommented:
If you are moving data from *nix to Windows then that could be the case if you
generate the file in *nix and open it in wondows.
You could alwasys use SQLYog , MySQLAdmint tool or Toad for MySQL to extract/import or replciate the data
beteween databases. I have used it to transfer data from Win to Linux where I have Latin1 charset on Windows
and UTF8 on Linux with great success.

Regards,
   Tomas Helgi
0
 
Tomas Helgi JohannssonCommented:
MySQLAdmint should be MySQLAdministrator :)
0
 
johnikeAuthor Commented:
I don't quite get it, but I think your answer worked  Tomas Helgi.   It does not really make sense to me, but I believe I am seeing a UTF8 result by not specifying the --default-character-set=latin1 on the dump .

In the database I now see a sequence that looks a bit odd (instead of it being a tilda for example), but when I have it come back out to my screens, it is showing up correctly as a tilda.

I need to do a bit of looking into whether or not what I am seeing is correct in the database, but thank you for such a simple fix.

I will follow up on this later.
0
 
johnikeAuthor Commented:
Seems like it worked.  I have yet to put it in production, but I figured I would give you the points now.  Thank you very much.
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now