Solved

"Convert to UTF8" Without Converting

Posted on 2006-11-06
6
2,561 Views
Last Modified: 2011-08-18
I have a table which CLAIMS it's using latin1 character encoding, however what's stored in each row is actually UTF-8 data.  To clarify: SHOW CREATE TABLE insists that the encoding is latin1, but SELECT HEX(column_name) reveals a UTF-8 sequence.

I want to tell MySQL that this table is really stored as UTF-8.  Doing this, however, by any method I've seen described in the manual, causes MySQL to "convert" my data.  That is, it takes my UTF-8 string, interprets it as latin1, converts that "latin1" string to UTF-8, and stores some mangled data back in my table.

I need a way to just indicate "Listen, trust me, this is UTF-8; don't do anything about it, just record for posterity's sake that I'm using a different encoding than you think I am."


It took me a while to get my head around all this, so I'll be happy to clarify if it's confusing.  I expect, though, that there's a really simple solution I just haven't encountered.

Note that statements like ALTER TABLE t CHANGE COLUMN c c VARCHAR(50) CHARACTER SET utf8;  do this undesirable conversion.  ALTER TABLE t CHARACTER SET utf8;  does nothing but change the default (and leaves the columns marked as latin1).  ALTER TABLE t CONVERT TO CHARACTER SET utf8;  does the undesirable conversion again.

Experts?
0
Comment
Question by:VoteyDisciple
  • 4
  • 2
6 Comments
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17884183
(And let's just not even talk about how exactly this situation came to pass or why I've now spent all morning figuring out what was going on, mmmmk?)  (-:
0
 
LVL 30

Accepted Solution

by:
todd_farmer earned 100 total points
ID: 17884408
Are you able to export the data correctly using mysqldump?  If so, you could dump the contents, truncate the table, alter it to use UTF8, then reload the data.  Of course, this won't help you provide a reproducible test case to MySQL if you are wanting to open a bug report.

What version of MySQL are you using?
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17885504
Well, simply dumping and then importing the data shouldn't change anything inherently (it'll just create the table described as latin1 and dump in the utf8 data from the dump), so what would I need to change?

I tried playing with altering the table definition in the dumped .sql file to SAY it's utf8 but that didn't seem to have the desired effect.  I may just have screwed that up, since it seems promising, but unless  I made a mistake I don't think that did the trick.
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 30

Expert Comment

by:todd_farmer
ID: 17885568
That's what I had in mind.  You could also take the dump file and eliminate the CREATE TABLE statement (or use the appropriate mysqldump flags to prevent it in the first place).  Then truncate and alter the existing table so that it ius UTF8 and reload the data.
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17890687
I've played with this some more this morning, and here's what I've tried:

1.  Dump a table (with no special modifiers) to a file.
2.  Change, at the beginning of the file, SET NAMES latin1 to SET NAMES utf8
3.  Change the table definition to CHARACTER SET utf8
4.  Import the data again

In this case I'm feeding UTF-8 data over a connection that I've indicated uses UTF-8, into a database table that's designed for UTF-8.

The data get "converted."  I have absolutely  no idea why, but the bytes in my .sql file aren't the same bytes that end up in the database table.
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17901328
That did ultimately work; it just took picking the right combination of utf8 vs. latin1 throughout each step of this insanely bizarre process:

1.  mysqldump --default-character-set=latin1 > file.sql
2.  In file.sql, leave SET NAMES latin1 as-is
3.  In file.sql, change CREATE TABLE statements to use DEFAULT CHARACTER SET utf8
4.  Import using mysql < file.sql (no special indicator here)

This, it seems, preserves the correct byte sequences, but in tables that announce correctly that they use UTF-8.
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
MySQL Grouping 2 53
Complex MySQL Query 2 38
MySQL InnodDB Import from mysqldump takes forever. 2 53
Group By Function Required in PDF Output 21 30
I have been using r1soft Continuous Data Protection (http://www.r1soft.com/linux-cdp/) for many years now with the mySQL Addon and wanted to share a trick I have used several times. For those of us that don't have the luxury of using all transact…
Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question