Solved

"Convert to UTF8" Without Converting

Posted on 2006-11-06
6
2,553 Views
Last Modified: 2011-08-18
I have a table which CLAIMS it's using latin1 character encoding, however what's stored in each row is actually UTF-8 data.  To clarify: SHOW CREATE TABLE insists that the encoding is latin1, but SELECT HEX(column_name) reveals a UTF-8 sequence.

I want to tell MySQL that this table is really stored as UTF-8.  Doing this, however, by any method I've seen described in the manual, causes MySQL to "convert" my data.  That is, it takes my UTF-8 string, interprets it as latin1, converts that "latin1" string to UTF-8, and stores some mangled data back in my table.

I need a way to just indicate "Listen, trust me, this is UTF-8; don't do anything about it, just record for posterity's sake that I'm using a different encoding than you think I am."


It took me a while to get my head around all this, so I'll be happy to clarify if it's confusing.  I expect, though, that there's a really simple solution I just haven't encountered.

Note that statements like ALTER TABLE t CHANGE COLUMN c c VARCHAR(50) CHARACTER SET utf8;  do this undesirable conversion.  ALTER TABLE t CHARACTER SET utf8;  does nothing but change the default (and leaves the columns marked as latin1).  ALTER TABLE t CONVERT TO CHARACTER SET utf8;  does the undesirable conversion again.

Experts?
0
Comment
Question by:VoteyDisciple
  • 4
  • 2
6 Comments
 
LVL 19

Author Comment

by:VoteyDisciple
Comment Utility
(And let's just not even talk about how exactly this situation came to pass or why I've now spent all morning figuring out what was going on, mmmmk?)  (-:
0
 
LVL 30

Accepted Solution

by:
todd_farmer earned 100 total points
Comment Utility
Are you able to export the data correctly using mysqldump?  If so, you could dump the contents, truncate the table, alter it to use UTF8, then reload the data.  Of course, this won't help you provide a reproducible test case to MySQL if you are wanting to open a bug report.

What version of MySQL are you using?
0
 
LVL 19

Author Comment

by:VoteyDisciple
Comment Utility
Well, simply dumping and then importing the data shouldn't change anything inherently (it'll just create the table described as latin1 and dump in the utf8 data from the dump), so what would I need to change?

I tried playing with altering the table definition in the dumped .sql file to SAY it's utf8 but that didn't seem to have the desired effect.  I may just have screwed that up, since it seems promising, but unless  I made a mistake I don't think that did the trick.
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 
LVL 30

Expert Comment

by:todd_farmer
Comment Utility
That's what I had in mind.  You could also take the dump file and eliminate the CREATE TABLE statement (or use the appropriate mysqldump flags to prevent it in the first place).  Then truncate and alter the existing table so that it ius UTF8 and reload the data.
0
 
LVL 19

Author Comment

by:VoteyDisciple
Comment Utility
I've played with this some more this morning, and here's what I've tried:

1.  Dump a table (with no special modifiers) to a file.
2.  Change, at the beginning of the file, SET NAMES latin1 to SET NAMES utf8
3.  Change the table definition to CHARACTER SET utf8
4.  Import the data again

In this case I'm feeding UTF-8 data over a connection that I've indicated uses UTF-8, into a database table that's designed for UTF-8.

The data get "converted."  I have absolutely  no idea why, but the bytes in my .sql file aren't the same bytes that end up in the database table.
0
 
LVL 19

Author Comment

by:VoteyDisciple
Comment Utility
That did ultimately work; it just took picking the right combination of utf8 vs. latin1 throughout each step of this insanely bizarre process:

1.  mysqldump --default-character-set=latin1 > file.sql
2.  In file.sql, leave SET NAMES latin1 as-is
3.  In file.sql, change CREATE TABLE statements to use DEFAULT CHARACTER SET utf8
4.  Import using mysql < file.sql (no special indicator here)

This, it seems, preserves the correct byte sequences, but in tables that announce correctly that they use UTF-8.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now