• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2571
  • Last Modified:

"Convert to UTF8" Without Converting

I have a table which CLAIMS it's using latin1 character encoding, however what's stored in each row is actually UTF-8 data.  To clarify: SHOW CREATE TABLE insists that the encoding is latin1, but SELECT HEX(column_name) reveals a UTF-8 sequence.

I want to tell MySQL that this table is really stored as UTF-8.  Doing this, however, by any method I've seen described in the manual, causes MySQL to "convert" my data.  That is, it takes my UTF-8 string, interprets it as latin1, converts that "latin1" string to UTF-8, and stores some mangled data back in my table.

I need a way to just indicate "Listen, trust me, this is UTF-8; don't do anything about it, just record for posterity's sake that I'm using a different encoding than you think I am."


It took me a while to get my head around all this, so I'll be happy to clarify if it's confusing.  I expect, though, that there's a really simple solution I just haven't encountered.

Note that statements like ALTER TABLE t CHANGE COLUMN c c VARCHAR(50) CHARACTER SET utf8;  do this undesirable conversion.  ALTER TABLE t CHARACTER SET utf8;  does nothing but change the default (and leaves the columns marked as latin1).  ALTER TABLE t CONVERT TO CHARACTER SET utf8;  does the undesirable conversion again.

Experts?
0
VoteyDisciple
Asked:
VoteyDisciple
  • 4
  • 2
1 Solution
 
VoteyDiscipleAuthor Commented:
(And let's just not even talk about how exactly this situation came to pass or why I've now spent all morning figuring out what was going on, mmmmk?)  (-:
0
 
todd_farmerCommented:
Are you able to export the data correctly using mysqldump?  If so, you could dump the contents, truncate the table, alter it to use UTF8, then reload the data.  Of course, this won't help you provide a reproducible test case to MySQL if you are wanting to open a bug report.

What version of MySQL are you using?
0
 
VoteyDiscipleAuthor Commented:
Well, simply dumping and then importing the data shouldn't change anything inherently (it'll just create the table described as latin1 and dump in the utf8 data from the dump), so what would I need to change?

I tried playing with altering the table definition in the dumped .sql file to SAY it's utf8 but that didn't seem to have the desired effect.  I may just have screwed that up, since it seems promising, but unless  I made a mistake I don't think that did the trick.
0
Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

 
todd_farmerCommented:
That's what I had in mind.  You could also take the dump file and eliminate the CREATE TABLE statement (or use the appropriate mysqldump flags to prevent it in the first place).  Then truncate and alter the existing table so that it ius UTF8 and reload the data.
0
 
VoteyDiscipleAuthor Commented:
I've played with this some more this morning, and here's what I've tried:

1.  Dump a table (with no special modifiers) to a file.
2.  Change, at the beginning of the file, SET NAMES latin1 to SET NAMES utf8
3.  Change the table definition to CHARACTER SET utf8
4.  Import the data again

In this case I'm feeding UTF-8 data over a connection that I've indicated uses UTF-8, into a database table that's designed for UTF-8.

The data get "converted."  I have absolutely  no idea why, but the bytes in my .sql file aren't the same bytes that end up in the database table.
0
 
VoteyDiscipleAuthor Commented:
That did ultimately work; it just took picking the right combination of utf8 vs. latin1 throughout each step of this insanely bizarre process:

1.  mysqldump --default-character-set=latin1 > file.sql
2.  In file.sql, leave SET NAMES latin1 as-is
3.  In file.sql, change CREATE TABLE statements to use DEFAULT CHARACTER SET utf8
4.  Import using mysql < file.sql (no special indicator here)

This, it seems, preserves the correct byte sequences, but in tables that announce correctly that they use UTF-8.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now