Solved

imported database has special chars encoded as 'é' instead of 'é'

Posted on 2006-11-14
14
690 Views
Last Modified: 2008-02-01
Hi

I have an export from a mysql database from an other company.
This is a textfile with create and insert statements.

CREATE DATABASE `dbname` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;

INSERT INTO `comments` VALUES (272, 'address@mail.com', 'How much does this cost and is it also available in België?');
(this is translated from dutch)
The last word here should be "België" not "België".

    1. Is there anything I can do to fix this at import.
    2. Or should the other company have done something else while exporting?
       2.1 And what do I need to tell them to get the data in the right format?

Thanks alot
0
Comment
Question by:Zorro032798
  • 6
  • 4
  • 3
14 Comments
 
LVL 9

Expert Comment

by:riaancornelius
ID: 17937077
1. Yes, you need to change the collation on the fields you are importing into. The character is probably a unicode character, which can't be represented in the selected collation. Try changing the collation on the import field to one of the utf8 collations (like utf8_spanish_ci)
2. Nope, you should be able to import fine.
0
 

Author Comment

by:Zorro032798
ID: 17937313
Thanks

But it doesn't work, the data looks still like '...als je met één of ander probleem zit.' in the database.

Using 'DEFAULT CHARACTER SET' gave problems so I did it without 'DEFAULT', both with CREATE DATABASE as with CREATE TABLE

What I've done:
1. in phpMyAdmin (2.4.0)
   CREATE DATABASE `dbname` DEFAULT CHARACTER SET utf8 COLLATE utf8_spanish_ci;

   ERROR:
      SQL-query :
      CREATE DATABASE `dbname` DEFAULTCHARACTER SET utf8 COLLATE utf8_spanish_ci
      MySQL said:
      You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DEFAULTCHARACTER SET utf8 COLLATE utf8_spanish_ci' at line 1

2. CREATE DATABASE `dbname` CHARACTER SET utf8 COLLATE utf8_spanish_ci;

3. CREATE TABLE IF NOT EXISTS `comments` (
  `id` smallint(6) NOT NULL auto_increment,
  `text` text,
  PRIMARY KEY  (`id`)
) CHARSET=utf8;

4. INSERT INTO `comments` VALUES (1, '... als je met één of ander probleem zit.');
0
 
LVL 9

Assisted Solution

by:riaancornelius
riaancornelius earned 200 total points
ID: 17937419
try it like this:
CREATE TABLE `test` (
`id` SMALLINT NOT NULL AUTO_INCREMENT ,
`text` TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL ,
PRIMARY KEY ( `id` )
);
0
 

Author Comment

by:Zorro032798
ID: 17937650
info: I'm using MySQL 4.1.8

This doesn't work either
It doesn't give any errors, but é stays é

I don't understand how MySQL could repair this.

i've got a plain-text SQL-file, containing things like:
INSERT INTO `comments` VALUES (1, '... als je met één of ander probleem zit.');

It's not 'é' in the file, it's 'é'.
Is this really correct? Can MySQL recognize those sequences and make 'é' of it?

I still think we should have gotten an export where the SQL looked like:
INSERT INTO `comments` VALUES (1, '... als je met één of ander probleem zit.');

0
 
LVL 19

Expert Comment

by:VoteyDisciple
ID: 17938918
A character encoding is nothing more than a way of expressing what an é is supposed to look like in bytes.  Different encodings naturally have different ways of expressing how an é should be represented, so the problem is what happens when you encode a character in, say, utf8 (in which case é becomes c3a9 -- a fact I've just looked up at http://www.utf8-chartable.de/ )

The problem arises when you then try to INTERPRET that c3a9 value as belonging to a DIFFERENT character encoding.  In latin1, for example, c3a9 may very well mean é.  (Or likewise for any other combination of encodings.)

So the trick to repairing it is to answer the questions.
1.  What encoding does MySQL think the data are stored in?
2.  What encoding are the data actually stored in?

It may help to try something like
SELECT SUBSTRING(field_name, 24, 5), HEX(SUBSTRING(field_name, 24, 5))
FROM comments
WHERE id=7

(Where 24 is the position the é is supposed to be in that record.)  This will show you how the data are actually stored (and of course a SHOW CREATE TABLE comments will tell you how MySQL thinks they're stored).

From there coming up with the right combination to repair it is "easy" (in roughly the same way that figuring out daylight savings time is "easy" -- it's just adding or subtracting one, yet twice a year I end up with a headache Sunday morning).
0
 

Author Comment

by:Zorro032798
ID: 17939509
1.  I've looked with a hexeditor at my sqlfile before import and é looks like C3A9
2.  SELECT SUBSTRING( text, 274, 4  ) , HEX( SUBSTRING( text, 274, 4  )  )  FROM comments WHERE id = 118
    gives: éé    C383C2A9C383C2A9
3.  SHOW CREATE TABLE comments
    gives: CREATE TABLE `discuss` (\n `id` smallint(6) NOT NULL auto_increment,\n `text` text,\n PRIMARY KEY (`id`)\n) ENGINE=MyISAM DEFAULT CHARSET=utf8

So the individual chars à and © get interpreted as unicode,
but I don't know what to do to fix this
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 19

Accepted Solution

by:
VoteyDisciple earned 200 total points
ID: 17940736
Ahhh, so indeed you did have correct utf8 encoding before importing, yet have some other (I think latin1) encoding after importing.

Try this: at the top of the .sql file should be a line like "SET NAMES latin1" (or "SET NAMES" something else).  Change it to "SET NAMES utf8", and re-run the script (replacing all the data that are there now).  Also make sure, as you already have, all the table definitions use utf8 as well (in case those got changed somewhere along the line).
0
 

Author Comment

by:Zorro032798
ID: 17945386
Thanks a lot guys!

This works.

So now I have:
- CREATE DATABASE, CREATE TABLE with utf8
- import starting with SET NAMES utf8

and the resulting text in the DB contains readable special chars

See You
0
 
LVL 19

Expert Comment

by:VoteyDisciple
ID: 17946140
I have to ask: if it worked fine, why the B grade?
0
 

Author Comment

by:Zorro032798
ID: 17946178
Is that a B grade?
I'm sorry, I thought I said "Good".
I'm not familiar with the customs here.
I assumed "Excellent" was to be used on special occasions.
I will use "Excellent" for all good answers in the future.
0
 
LVL 9

Expert Comment

by:riaancornelius
ID: 17946202
This is a good rule of thumb.

Excellent is when your question was answered
Good is when your question was answered, but you had to research some aspect yourself. Usually this will only happen if you don't come back and ask for clarification.
C grade is for when you were really just pointed in the right direction, but you basically found the answer yourself.

Another point to note is that you can ask a page admin to reopen the question if you want to score it differently.
0
 

Author Comment

by:Zorro032798
ID: 17947297
Thanks for Excellent help :)
0
 
LVL 9

Expert Comment

by:riaancornelius
ID: 17953388
thanks :)
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

A lot of articles have been written on splitting mysqldump and grabbing the required tables. A long while back, when Shlomi (http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump) had suggested a “sed” way, I actually shell …
Creating and Managing Databases with phpMyAdmin in cPanel.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This video discusses moving either the default database or any database to a new volume.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now