Solved

Confused about changing mysqsl charsets

Posted on 2014-04-03
5
224 Views
Last Modified: 2014-04-20
Hello

I dont fully understand mysql charsets. Lets say i have the database set to be LATIN1 by default and then some program at runtime changes the charset to UTF8 what happens.

Is the data stored as UTF8 or converted by the database from UTF8 to the database defaut of latin1. If it was stored as UTF8 wouldnt subsequent client requests be 'confused' as they are expecting to receive latin1 whereas they actually get utf8?

Is it correct that these settings in my.cnf will force client and server to use UTF8 so that even if a program tries to change the character set at runtime then the changes are ignored

skip-character-set-client-handshake
collation-server=utf8_unicode_ci
character-set-server=utf8

I was advised to force all servers and clients to use UTF8 for safety.

Many thanks in advance for any help you experts can give
0
Comment
Question by:andieje
  • 3
5 Comments
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39977031
Changing the charset collation does not change the characters so changing the character set of a database or table with data can cause problems.  The character codes for 'latin1' and 'utf-8' don't mean the same thing for some codes plus 'utf-8' has 'multi-byte' codes that aren't in 'latin1' at all.  This site http://www.alanwood.net/ is my best reference for characters sets and the codes used in different character sets.

While UTF-8 is intended to be the 'universal character set', there is no "safety" in using the wrong character set.  If you really have data in a 'latin1' character set and not just the ASCII character in the lower 128 code values, changing to UTF-8 will cause a problem.  Changing the 'character set' does not change the actual data.
0
 

Author Comment

by:andieje
ID: 39977751
Thanks Dave but I think my question was slightly different than that, probably because i dont understand charsets properly. I've never really needed to understand them before. I just use the default !

If my database is charset latin 1 and someone writes some code to change the charset at runtime to utf8 and data is written to the database what happens? Is it written as utf8 or does the database convert it to latin 1? If it is written as UTF8 and then a different client makes a request for the data won't they get it back as UTF8 when they are expecting latin1.

I'm sorry if my question is more basic that you are expecting.
0
 
LVL 83

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 250 total points
ID: 39978634
A "character set declaration" is just a tag that tells you how to view the data.  For character set declarations to work properly, the correct character set must be used from start to finish.  The data itself Does Not get translated if you change the character set declaration.  

Character set declarations are like language tags.  If you put an English tag on someone that speaks only Russian, it does not cause an automatic translation to English.  It does however confuse the person that thought they were going to hear English.  Changing the character set declaration Does Not change the actual data but it can confuse the person retrieving it when they get characters that don't make any sense.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39979353
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 250 total points
ID: 39982815
I was advised to force all servers and clients to use UTF8 for safety.
That's probably good advice, going forward, because that's the way the wind is blowing.  Even PHP which has been notorious for assuming that a byte == a character is acknowledging at PHP 5.4 that multi-byte characters exist.  See the part about "encoding" here:
http://php.net/manual/en/function.htmlentities.php

The article will probably help you understand what you must do to make the conversion from Latin1 to UTF-8.  If it were up to me, I would make a complete copy of my data base rather than change any tables on a piecemeal basis.  Consistency is important; your PHP code, your MySQL data and encoding settings, and your HTML meta charset should all reflect the same encoding.  And, of course, the underlying data must be correct.
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

A lot of articles have been written on splitting mysqldump and grabbing the required tables. A long while back, when Shlomi (http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump) had suggested a “sed” way, I actually shell …
I use MySQL for many of my development projects in a Windows environment. To manage my databases (and perform queries) for years I used a tool called MySQL administrator.  This tool has since been replaced by MySQL Workbench. So I decided to m…
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

816 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now