imported database has special chars encoded as 'é' instead of 'é'

Hi

I have an export from a mysql database from an other company.
This is a textfile with create and insert statements.

CREATE DATABASE `dbname` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;

INSERT INTO `comments` VALUES (272, 'address@mail.com', 'How much does this cost and is it also available in België?');
(this is translated from dutch)
The last word here should be "België" not "België".

    1. Is there anything I can do to fix this at import.
    2. Or should the other company have done something else while exporting?
       2.1 And what do I need to tell them to get the data in the right format?

Thanks alot
Zorro032798Asked:
Who is Participating?
 
VoteyDiscipleCommented:
Ahhh, so indeed you did have correct utf8 encoding before importing, yet have some other (I think latin1) encoding after importing.

Try this: at the top of the .sql file should be a line like "SET NAMES latin1" (or "SET NAMES" something else).  Change it to "SET NAMES utf8", and re-run the script (replacing all the data that are there now).  Also make sure, as you already have, all the table definitions use utf8 as well (in case those got changed somewhere along the line).
0
 
riaancorneliusCommented:
1. Yes, you need to change the collation on the fields you are importing into. The character is probably a unicode character, which can't be represented in the selected collation. Try changing the collation on the import field to one of the utf8 collations (like utf8_spanish_ci)
2. Nope, you should be able to import fine.
0
 
Zorro032798Author Commented:
Thanks

But it doesn't work, the data looks still like '...als je met één of ander probleem zit.' in the database.

Using 'DEFAULT CHARACTER SET' gave problems so I did it without 'DEFAULT', both with CREATE DATABASE as with CREATE TABLE

What I've done:
1. in phpMyAdmin (2.4.0)
   CREATE DATABASE `dbname` DEFAULT CHARACTER SET utf8 COLLATE utf8_spanish_ci;

   ERROR:
      SQL-query :
      CREATE DATABASE `dbname` DEFAULTCHARACTER SET utf8 COLLATE utf8_spanish_ci
      MySQL said:
      You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DEFAULTCHARACTER SET utf8 COLLATE utf8_spanish_ci' at line 1

2. CREATE DATABASE `dbname` CHARACTER SET utf8 COLLATE utf8_spanish_ci;

3. CREATE TABLE IF NOT EXISTS `comments` (
  `id` smallint(6) NOT NULL auto_increment,
  `text` text,
  PRIMARY KEY  (`id`)
) CHARSET=utf8;

4. INSERT INTO `comments` VALUES (1, '... als je met één of ander probleem zit.');
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
riaancorneliusCommented:
try it like this:
CREATE TABLE `test` (
`id` SMALLINT NOT NULL AUTO_INCREMENT ,
`text` TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL ,
PRIMARY KEY ( `id` )
);
0
 
Zorro032798Author Commented:
info: I'm using MySQL 4.1.8

This doesn't work either
It doesn't give any errors, but é stays é

I don't understand how MySQL could repair this.

i've got a plain-text SQL-file, containing things like:
INSERT INTO `comments` VALUES (1, '... als je met één of ander probleem zit.');

It's not 'é' in the file, it's 'é'.
Is this really correct? Can MySQL recognize those sequences and make 'é' of it?

I still think we should have gotten an export where the SQL looked like:
INSERT INTO `comments` VALUES (1, '... als je met één of ander probleem zit.');

0
 
VoteyDiscipleCommented:
A character encoding is nothing more than a way of expressing what an é is supposed to look like in bytes.  Different encodings naturally have different ways of expressing how an é should be represented, so the problem is what happens when you encode a character in, say, utf8 (in which case é becomes c3a9 -- a fact I've just looked up at http://www.utf8-chartable.de/ )

The problem arises when you then try to INTERPRET that c3a9 value as belonging to a DIFFERENT character encoding.  In latin1, for example, c3a9 may very well mean é.  (Or likewise for any other combination of encodings.)

So the trick to repairing it is to answer the questions.
1.  What encoding does MySQL think the data are stored in?
2.  What encoding are the data actually stored in?

It may help to try something like
SELECT SUBSTRING(field_name, 24, 5), HEX(SUBSTRING(field_name, 24, 5))
FROM comments
WHERE id=7

(Where 24 is the position the é is supposed to be in that record.)  This will show you how the data are actually stored (and of course a SHOW CREATE TABLE comments will tell you how MySQL thinks they're stored).

From there coming up with the right combination to repair it is "easy" (in roughly the same way that figuring out daylight savings time is "easy" -- it's just adding or subtracting one, yet twice a year I end up with a headache Sunday morning).
0
 
Zorro032798Author Commented:
1.  I've looked with a hexeditor at my sqlfile before import and é looks like C3A9
2.  SELECT SUBSTRING( text, 274, 4  ) , HEX( SUBSTRING( text, 274, 4  )  )  FROM comments WHERE id = 118
    gives: éé    C383C2A9C383C2A9
3.  SHOW CREATE TABLE comments
    gives: CREATE TABLE `discuss` (\n `id` smallint(6) NOT NULL auto_increment,\n `text` text,\n PRIMARY KEY (`id`)\n) ENGINE=MyISAM DEFAULT CHARSET=utf8

So the individual chars à and © get interpreted as unicode,
but I don't know what to do to fix this
0
 
Zorro032798Author Commented:
Thanks a lot guys!

This works.

So now I have:
- CREATE DATABASE, CREATE TABLE with utf8
- import starting with SET NAMES utf8

and the resulting text in the DB contains readable special chars

See You
0
 
VoteyDiscipleCommented:
I have to ask: if it worked fine, why the B grade?
0
 
Zorro032798Author Commented:
Is that a B grade?
I'm sorry, I thought I said "Good".
I'm not familiar with the customs here.
I assumed "Excellent" was to be used on special occasions.
I will use "Excellent" for all good answers in the future.
0
 
riaancorneliusCommented:
This is a good rule of thumb.

Excellent is when your question was answered
Good is when your question was answered, but you had to research some aspect yourself. Usually this will only happen if you don't come back and ask for clarification.
C grade is for when you were really just pointed in the right direction, but you basically found the answer yourself.

Another point to note is that you can ask a page admin to reopen the question if you want to score it differently.
0
 
Zorro032798Author Commented:
Thanks for Excellent help :)
0
 
riaancorneliusCommented:
thanks :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.