MySql data conversion from latin1 to utf8

I am trying to convert my MySql 5.0.* application data from latin1 to UTF8.  In doing so, my European words with special characters are getting truncated upon uploading.

My database settings were originally:
character_set_client    latin1
character_set_connection        latin1
character_set_database  latin1
character_set_filesystem        binary
character_set_results  latin1
character_set_server    latin1
character_set_system    utf8
character_sets_dir      /usr/share/mysql/charsets/
Variable_name  Value
collation_connection    latin1_swedish_ci
collation_database      latin1_swedish_ci
collation_server        latin1_swedish_ci


Based on database config file changes, they are now:
character_set_client    utf8
character_set_connection        utf8
character_set_database  utf8
character_set_filesystem        binary
character_set_results  utf8
character_set_server    utf8
character_set_system    utf8
character_sets_dir      C:\\apps\\mysql\\share\\charsets\\
Variable_name  Value
collation_connection    utf8_general_ci
collation_database      utf8_general_ci
collation_server        utf8_unicode_ci

This seems correct.

My code does the following:
mysqldump ${DATABASE} ${ID} ${PASSWORD} --default-character-set=latin1 |
            sed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/g" | \
            sed -e "s/SET NAMES latin1/SET NAMES utf8/g"  > datafile

I drop the old database and recreated it as UTF8.

Then I do:
cat datafile | mysql --default-character-set=utf8 ${ID} ${PASSWORD} ${DATABASE}

Once this is done the data after special characters in fields is truncated in the new database.  datafile holds data that looks like it still has the special characters.

Ideas on what I am doing wrong?

Thanks!


johnikeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

NelliosCommented:
I am not very familiar with sed (damn I should pay more attention when reading Kernigham) but If I understand correctly you are replacing the Default Character Set from latin1 to utf8 in your dump file.

I think that this is not enough and I will tell you why:

mysql >= 5.0 also use collation when it comes to VARCHAR declarations. So I guess you also need to change the collation to utf8_bin or utf8_ci depending on your needs.
0
Tomas Helgi JohannssonCommented:
  Hi!

When issueing the mysqldump command then it says in the manual
http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html
that " If no character set is specified, mysqldump uses utf8, and earlier versions use latin1."
So loose the "--default-character-set=latin1" option and just issue
mysqldump ${DATABASE} ${ID} ${PASSWORD} > datafile
Import that file and you should have the data in UTF8 format.

Regards,
   Tomas Helgi
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
johnikeAuthor Commented:
I'm trying things based on your suggestions.  Do you think this could happen due to transfering the data from UNIX to windows?
0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

Tomas Helgi JohannssonCommented:
If you are moving data from *nix to Windows then that could be the case if you
generate the file in *nix and open it in wondows.
You could alwasys use SQLYog , MySQLAdmint tool or Toad for MySQL to extract/import or replciate the data
beteween databases. I have used it to transfer data from Win to Linux where I have Latin1 charset on Windows
and UTF8 on Linux with great success.

Regards,
   Tomas Helgi
0
Tomas Helgi JohannssonCommented:
MySQLAdmint should be MySQLAdministrator :)
0
johnikeAuthor Commented:
I don't quite get it, but I think your answer worked  Tomas Helgi.   It does not really make sense to me, but I believe I am seeing a UTF8 result by not specifying the --default-character-set=latin1 on the dump .

In the database I now see a sequence that looks a bit odd (instead of it being a tilda for example), but when I have it come back out to my screens, it is showing up correctly as a tilda.

I need to do a bit of looking into whether or not what I am seeing is correct in the database, but thank you for such a simple fix.

I will follow up on this later.
0
johnikeAuthor Commented:
Seems like it worked.  I have yet to put it in production, but I figured I would give you the points now.  Thank you very much.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.