Solved

Confused about changing mysqsl charsets

Posted on 2014-04-03
5
220 Views
Last Modified: 2014-04-20
Hello

I dont fully understand mysql charsets. Lets say i have the database set to be LATIN1 by default and then some program at runtime changes the charset to UTF8 what happens.

Is the data stored as UTF8 or converted by the database from UTF8 to the database defaut of latin1. If it was stored as UTF8 wouldnt subsequent client requests be 'confused' as they are expecting to receive latin1 whereas they actually get utf8?

Is it correct that these settings in my.cnf will force client and server to use UTF8 so that even if a program tries to change the character set at runtime then the changes are ignored

skip-character-set-client-handshake
collation-server=utf8_unicode_ci
character-set-server=utf8

I was advised to force all servers and clients to use UTF8 for safety.

Many thanks in advance for any help you experts can give
0
Comment
Question by:andieje
  • 3
5 Comments
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
Changing the charset collation does not change the characters so changing the character set of a database or table with data can cause problems.  The character codes for 'latin1' and 'utf-8' don't mean the same thing for some codes plus 'utf-8' has 'multi-byte' codes that aren't in 'latin1' at all.  This site http://www.alanwood.net/ is my best reference for characters sets and the codes used in different character sets.

While UTF-8 is intended to be the 'universal character set', there is no "safety" in using the wrong character set.  If you really have data in a 'latin1' character set and not just the ASCII character in the lower 128 code values, changing to UTF-8 will cause a problem.  Changing the 'character set' does not change the actual data.
0
 

Author Comment

by:andieje
Comment Utility
Thanks Dave but I think my question was slightly different than that, probably because i dont understand charsets properly. I've never really needed to understand them before. I just use the default !

If my database is charset latin 1 and someone writes some code to change the charset at runtime to utf8 and data is written to the database what happens? Is it written as utf8 or does the database convert it to latin 1? If it is written as UTF8 and then a different client makes a request for the data won't they get it back as UTF8 when they are expecting latin1.

I'm sorry if my question is more basic that you are expecting.
0
 
LVL 82

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 250 total points
Comment Utility
A "character set declaration" is just a tag that tells you how to view the data.  For character set declarations to work properly, the correct character set must be used from start to finish.  The data itself Does Not get translated if you change the character set declaration.  

Character set declarations are like language tags.  If you put an English tag on someone that speaks only Russian, it does not cause an automatic translation to English.  It does however confuse the person that thought they were going to hear English.  Changing the character set declaration Does Not change the actual data but it can confuse the person retrieving it when they get characters that don't make any sense.
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 250 total points
Comment Utility
I was advised to force all servers and clients to use UTF8 for safety.
That's probably good advice, going forward, because that's the way the wind is blowing.  Even PHP which has been notorious for assuming that a byte == a character is acknowledging at PHP 5.4 that multi-byte characters exist.  See the part about "encoding" here:
http://php.net/manual/en/function.htmlentities.php

The article will probably help you understand what you must do to make the conversion from Latin1 to UTF-8.  If it were up to me, I would make a complete copy of my data base rather than change any tables on a piecemeal basis.  Consistency is important; your PHP code, your MySQL data and encoding settings, and your HTML meta charset should all reflect the same encoding.  And, of course, the underlying data must be correct.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Introduction In this article, I will by showing a nice little trick for MySQL similar to that of my previous EE Article for SQLite (http://www.sqlite.org/), A SQLite Tidbit: Quick Numbers Table Generation (http://www.experts-exchange.com/A_3570.htm…
Creating and Managing Databases with phpMyAdmin in cPanel.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now