Link to home
Start Free TrialLog in
Avatar of killer455
killer455

asked on

Charsets and all languages

I need a list of all languages with what respective charset to use in the HTML/PHP web pages and in the MySQL database.   Also what should the collation be for MySQL?  What is the different between charset/collation for MySQL when dealing with languages?  I have seen sites describing what charsets to use for languages but they all differ.  I need a solid set so I can hardcode these options into a script.

Specifically I think this list...

Catalan
Portuguese
Czech
German
Danish
English
Spanish
Finnish
Faroese
French
Hungarian
Japanese
Italian
Dutch
Norwegian
Polish
Romanian
Russian
Swedish
Turkish
Chinese
Avatar of Eternal_Student
Eternal_Student
Flag of United Kingdom of Great Britain and Northern Ireland image

This may help with the php and mysql side of things:

http://dev.mysql.com/doc/refman/5.0/en/charset.html
Avatar of killer455
killer455

ASKER

Yes I have seen this but I need a detailed answer here specific to my question.

If you use UTF-8 (or UTF-16) you can support all the above languages. If you want to use a platform encoding, you can use:

ISO-9959-1, latin1 or Windows 1252 for : Catalan, Portuguese, Czech, German, Danish, English, Spanish, Finnish, Faroese, French, Italian, Dutch, Norwegian, Swedish
ISO-8859-2, latin2 or Windows 1251 for : Hungarian, Polish, Romanian
CP932, sjis or Shift-JIS for : Japanese
ISO-8859-5, KOI-8-R for: Russian (Cyrillic)
ISO-8859-9 for : Turkish
Big5 for : Traditional Chinese (Taiwanese)
GB 18030 for Simplified Chinese (PRC) - Note that this is required by the Chinese government so the old GB 2312 is no longer acceptable.

I very strongly recommend that you use UTF-8 since that is a universal solution for all the above and all the others you don't mention. MySql refers to this as "utf8". BTW, be careful - ucs2, another flavor of Unicode, will not fully support the Chinese and Japanese requirements since it doesn't support surrogates.

Be aware that character encodings and collations are not the same thing. They are only related in that they are associated with a particular country, but you can have multiple collations in one country as you can have multiple character encodings. MySql associates a *default* collation with a particular encoding. This is usually OK, but isn't necessarily correct. I recommend that you use utf8 and then have a collation against each of these languages, so you end up with a list that maps the language to the required collation. You can identify the available collations by doing a SHOW COLLATION LIKE 'utf%';
I just realized there's a typo in the above: the Latin charsets are all ISO-8859-x, not "9959". These are usually referenced as latinx, where x = the part of the 8859 codepages.
When coding an application that many different languaged users will use.  Is there a easy way... like a php function changeLanguage() that could set everything necesarry for the new language support for the HTML/PHP pages and the database?  Can this be changed on the fly?

ASKER CERTIFIED SOLUTION
Avatar of bpmurray
bpmurray
Flag of Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial