Solved

Confused about changing mysqsl charsets

Posted on 2014-04-03
5
248 Views
Last Modified: 2014-04-20
Hello

I dont fully understand mysql charsets. Lets say i have the database set to be LATIN1 by default and then some program at runtime changes the charset to UTF8 what happens.

Is the data stored as UTF8 or converted by the database from UTF8 to the database defaut of latin1. If it was stored as UTF8 wouldnt subsequent client requests be 'confused' as they are expecting to receive latin1 whereas they actually get utf8?

Is it correct that these settings in my.cnf will force client and server to use UTF8 so that even if a program tries to change the character set at runtime then the changes are ignored

skip-character-set-client-handshake
collation-server=utf8_unicode_ci
character-set-server=utf8

I was advised to force all servers and clients to use UTF8 for safety.

Many thanks in advance for any help you experts can give
0
Comment
Question by:andieje
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39977031
Changing the charset collation does not change the characters so changing the character set of a database or table with data can cause problems.  The character codes for 'latin1' and 'utf-8' don't mean the same thing for some codes plus 'utf-8' has 'multi-byte' codes that aren't in 'latin1' at all.  This site http://www.alanwood.net/ is my best reference for characters sets and the codes used in different character sets.

While UTF-8 is intended to be the 'universal character set', there is no "safety" in using the wrong character set.  If you really have data in a 'latin1' character set and not just the ASCII character in the lower 128 code values, changing to UTF-8 will cause a problem.  Changing the 'character set' does not change the actual data.
0
 

Author Comment

by:andieje
ID: 39977751
Thanks Dave but I think my question was slightly different than that, probably because i dont understand charsets properly. I've never really needed to understand them before. I just use the default !

If my database is charset latin 1 and someone writes some code to change the charset at runtime to utf8 and data is written to the database what happens? Is it written as utf8 or does the database convert it to latin 1? If it is written as UTF8 and then a different client makes a request for the data won't they get it back as UTF8 when they are expecting latin1.

I'm sorry if my question is more basic that you are expecting.
0
 
LVL 83

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 250 total points
ID: 39978634
A "character set declaration" is just a tag that tells you how to view the data.  For character set declarations to work properly, the correct character set must be used from start to finish.  The data itself Does Not get translated if you change the character set declaration.  

Character set declarations are like language tags.  If you put an English tag on someone that speaks only Russian, it does not cause an automatic translation to English.  It does however confuse the person that thought they were going to hear English.  Changing the character set declaration Does Not change the actual data but it can confuse the person retrieving it when they get characters that don't make any sense.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39979353
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 250 total points
ID: 39982815
I was advised to force all servers and clients to use UTF8 for safety.
That's probably good advice, going forward, because that's the way the wind is blowing.  Even PHP which has been notorious for assuming that a byte == a character is acknowledging at PHP 5.4 that multi-byte characters exist.  See the part about "encoding" here:
http://php.net/manual/en/function.htmlentities.php

The article will probably help you understand what you must do to make the conversion from Latin1 to UTF-8.  If it were up to me, I would make a complete copy of my data base rather than change any tables on a piecemeal basis.  Consistency is important; your PHP code, your MySQL data and encoding settings, and your HTML meta charset should all reflect the same encoding.  And, of course, the underlying data must be correct.
0

Featured Post

[Webinar] How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Foreword This article was written many years ago, in the days when PHP supported the MySQL extension (http://php.net/manual/en/function.mysql-connect.php).  Today (http://php.net/manual/en/migration70.removed-exts-sapis.php) you would not use MySQL…
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question