Solved

Confused about changing mysqsl charsets

Posted on 2014-04-03
5
235 Views
Last Modified: 2014-04-20
Hello

I dont fully understand mysql charsets. Lets say i have the database set to be LATIN1 by default and then some program at runtime changes the charset to UTF8 what happens.

Is the data stored as UTF8 or converted by the database from UTF8 to the database defaut of latin1. If it was stored as UTF8 wouldnt subsequent client requests be 'confused' as they are expecting to receive latin1 whereas they actually get utf8?

Is it correct that these settings in my.cnf will force client and server to use UTF8 so that even if a program tries to change the character set at runtime then the changes are ignored

skip-character-set-client-handshake
collation-server=utf8_unicode_ci
character-set-server=utf8

I was advised to force all servers and clients to use UTF8 for safety.

Many thanks in advance for any help you experts can give
0
Comment
Question by:andieje
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39977031
Changing the charset collation does not change the characters so changing the character set of a database or table with data can cause problems.  The character codes for 'latin1' and 'utf-8' don't mean the same thing for some codes plus 'utf-8' has 'multi-byte' codes that aren't in 'latin1' at all.  This site http://www.alanwood.net/ is my best reference for characters sets and the codes used in different character sets.

While UTF-8 is intended to be the 'universal character set', there is no "safety" in using the wrong character set.  If you really have data in a 'latin1' character set and not just the ASCII character in the lower 128 code values, changing to UTF-8 will cause a problem.  Changing the 'character set' does not change the actual data.
0
 

Author Comment

by:andieje
ID: 39977751
Thanks Dave but I think my question was slightly different than that, probably because i dont understand charsets properly. I've never really needed to understand them before. I just use the default !

If my database is charset latin 1 and someone writes some code to change the charset at runtime to utf8 and data is written to the database what happens? Is it written as utf8 or does the database convert it to latin 1? If it is written as UTF8 and then a different client makes a request for the data won't they get it back as UTF8 when they are expecting latin1.

I'm sorry if my question is more basic that you are expecting.
0
 
LVL 83

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 250 total points
ID: 39978634
A "character set declaration" is just a tag that tells you how to view the data.  For character set declarations to work properly, the correct character set must be used from start to finish.  The data itself Does Not get translated if you change the character set declaration.  

Character set declarations are like language tags.  If you put an English tag on someone that speaks only Russian, it does not cause an automatic translation to English.  It does however confuse the person that thought they were going to hear English.  Changing the character set declaration Does Not change the actual data but it can confuse the person retrieving it when they get characters that don't make any sense.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39979353
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 250 total points
ID: 39982815
I was advised to force all servers and clients to use UTF8 for safety.
That's probably good advice, going forward, because that's the way the wind is blowing.  Even PHP which has been notorious for assuming that a byte == a character is acknowledging at PHP 5.4 that multi-byte characters exist.  See the part about "encoding" here:
http://php.net/manual/en/function.htmlentities.php

The article will probably help you understand what you must do to make the conversion from Latin1 to UTF-8.  If it were up to me, I would make a complete copy of my data base rather than change any tables on a piecemeal basis.  Consistency is important; your PHP code, your MySQL data and encoding settings, and your HTML meta charset should all reflect the same encoding.  And, of course, the underlying data must be correct.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I use MySQL for many of my development projects in a Windows environment. To manage my databases (and perform queries) for years I used a tool called MySQL administrator.  This tool has since been replaced by MySQL Workbench. So I decided to m…
Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question