Solved

"Convert to UTF8" Without Converting

Posted on 2006-11-06
6
2,556 Views
Last Modified: 2011-08-18
I have a table which CLAIMS it's using latin1 character encoding, however what's stored in each row is actually UTF-8 data.  To clarify: SHOW CREATE TABLE insists that the encoding is latin1, but SELECT HEX(column_name) reveals a UTF-8 sequence.

I want to tell MySQL that this table is really stored as UTF-8.  Doing this, however, by any method I've seen described in the manual, causes MySQL to "convert" my data.  That is, it takes my UTF-8 string, interprets it as latin1, converts that "latin1" string to UTF-8, and stores some mangled data back in my table.

I need a way to just indicate "Listen, trust me, this is UTF-8; don't do anything about it, just record for posterity's sake that I'm using a different encoding than you think I am."


It took me a while to get my head around all this, so I'll be happy to clarify if it's confusing.  I expect, though, that there's a really simple solution I just haven't encountered.

Note that statements like ALTER TABLE t CHANGE COLUMN c c VARCHAR(50) CHARACTER SET utf8;  do this undesirable conversion.  ALTER TABLE t CHARACTER SET utf8;  does nothing but change the default (and leaves the columns marked as latin1).  ALTER TABLE t CONVERT TO CHARACTER SET utf8;  does the undesirable conversion again.

Experts?
0
Comment
Question by:VoteyDisciple
  • 4
  • 2
6 Comments
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17884183
(And let's just not even talk about how exactly this situation came to pass or why I've now spent all morning figuring out what was going on, mmmmk?)  (-:
0
 
LVL 30

Accepted Solution

by:
todd_farmer earned 100 total points
ID: 17884408
Are you able to export the data correctly using mysqldump?  If so, you could dump the contents, truncate the table, alter it to use UTF8, then reload the data.  Of course, this won't help you provide a reproducible test case to MySQL if you are wanting to open a bug report.

What version of MySQL are you using?
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17885504
Well, simply dumping and then importing the data shouldn't change anything inherently (it'll just create the table described as latin1 and dump in the utf8 data from the dump), so what would I need to change?

I tried playing with altering the table definition in the dumped .sql file to SAY it's utf8 but that didn't seem to have the desired effect.  I may just have screwed that up, since it seems promising, but unless  I made a mistake I don't think that did the trick.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 30

Expert Comment

by:todd_farmer
ID: 17885568
That's what I had in mind.  You could also take the dump file and eliminate the CREATE TABLE statement (or use the appropriate mysqldump flags to prevent it in the first place).  Then truncate and alter the existing table so that it ius UTF8 and reload the data.
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17890687
I've played with this some more this morning, and here's what I've tried:

1.  Dump a table (with no special modifiers) to a file.
2.  Change, at the beginning of the file, SET NAMES latin1 to SET NAMES utf8
3.  Change the table definition to CHARACTER SET utf8
4.  Import the data again

In this case I'm feeding UTF-8 data over a connection that I've indicated uses UTF-8, into a database table that's designed for UTF-8.

The data get "converted."  I have absolutely  no idea why, but the bytes in my .sql file aren't the same bytes that end up in the database table.
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17901328
That did ultimately work; it just took picking the right combination of utf8 vs. latin1 throughout each step of this insanely bizarre process:

1.  mysqldump --default-character-set=latin1 > file.sql
2.  In file.sql, leave SET NAMES latin1 as-is
3.  In file.sql, change CREATE TABLE statements to use DEFAULT CHARACTER SET utf8
4.  Import using mysql < file.sql (no special indicator here)

This, it seems, preserves the correct byte sequences, but in tables that announce correctly that they use UTF-8.
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating and Managing Databases with phpMyAdmin in cPanel.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
Both in life and business – not all partnerships are created equal. As the demand for cloud services increases, so do the number of self-proclaimed cloud partners. Asking the right questions up front in the partnership, will enable both parties …

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

27 Experts available now in Live!

Get 1:1 Help Now