?
Solved

Charsets and all languages

Posted on 2006-06-12
8
Medium Priority
?
350 Views
Last Modified: 2008-03-17
I need a list of all languages with what respective charset to use in the HTML/PHP web pages and in the MySQL database.   Also what should the collation be for MySQL?  What is the different between charset/collation for MySQL when dealing with languages?  I have seen sites describing what charsets to use for languages but they all differ.  I need a solid set so I can hardcode these options into a script.

Specifically I think this list...

Catalan
Portuguese
Czech
German
Danish
English
Spanish
Finnish
Faroese
French
Hungarian
Japanese
Italian
Dutch
Norwegian
Polish
Romanian
Russian
Swedish
Turkish
Chinese
0
Comment
Question by:killer455
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
8 Comments
 
LVL 18

Expert Comment

by:Eternal_Student
ID: 16892859
This may help with the php and mysql side of things:

http://dev.mysql.com/doc/refman/5.0/en/charset.html
0
 

Author Comment

by:killer455
ID: 16895093
Yes I have seen this but I need a detailed answer here specific to my question.

0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16905434
If you use UTF-8 (or UTF-16) you can support all the above languages. If you want to use a platform encoding, you can use:

ISO-9959-1, latin1 or Windows 1252 for : Catalan, Portuguese, Czech, German, Danish, English, Spanish, Finnish, Faroese, French, Italian, Dutch, Norwegian, Swedish
ISO-8859-2, latin2 or Windows 1251 for : Hungarian, Polish, Romanian
CP932, sjis or Shift-JIS for : Japanese
ISO-8859-5, KOI-8-R for: Russian (Cyrillic)
ISO-8859-9 for : Turkish
Big5 for : Traditional Chinese (Taiwanese)
GB 18030 for Simplified Chinese (PRC) - Note that this is required by the Chinese government so the old GB 2312 is no longer acceptable.

I very strongly recommend that you use UTF-8 since that is a universal solution for all the above and all the others you don't mention. MySql refers to this as "utf8". BTW, be careful - ucs2, another flavor of Unicode, will not fully support the Chinese and Japanese requirements since it doesn't support surrogates.

Be aware that character encodings and collations are not the same thing. They are only related in that they are associated with a particular country, but you can have multiple collations in one country as you can have multiple character encodings. MySql associates a *default* collation with a particular encoding. This is usually OK, but isn't necessarily correct. I recommend that you use utf8 and then have a collation against each of these languages, so you end up with a list that maps the language to the required collation. You can identify the available collations by doing a SHOW COLLATION LIKE 'utf%';
0
Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

 
LVL 15

Expert Comment

by:bpmurray
ID: 16910937
I just realized there's a typo in the above: the Latin charsets are all ISO-8859-x, not "9959". These are usually referenced as latinx, where x = the part of the 8859 codepages.
0
 

Author Comment

by:killer455
ID: 16917618
When coding an application that many different languaged users will use.  Is there a easy way... like a php function changeLanguage() that could set everything necesarry for the new language support for the HTML/PHP pages and the database?  Can this be changed on the fly?

0
 
LVL 15

Accepted Solution

by:
bpmurray earned 200 total points
ID: 16918416
There are two main areas you have to watch out for when making your app internationally-enabled. These are the encoding and the locale settings. The encoding can be simplified by always using Unicode, UTF-8 is the most popular, although UTF-16 is probably the easiest to manipulate. The locale info is more complex. This contains the information that varies from locale to locale (see CLDR on unicode.org) and includes stuff like date formats (the US uses M/D/Y, the UK uses D/M/Y; the West uses the Gregorian calendar, Japan uses the Year of the Emperor, Arab countries use the Hijri calendar, etc.), number formats (1,000,000 is displayed as 10,00,000 in Hindi), collation sequences (j,k,l, ll,m, n,o,p,q,r,rr,s ... in traditional Spanish), casing (Turkish uppercase "i" has a dot on it, and lowercase "I" has no dot), etc. etc.

While the basic functionalities of this stuff are available in Java and C/C++, ICU4C & ICU4C provide extended functionality (see http://icu.sourceforge.net). Until I saw your question, I wasn't aware that there was any ICU support for PHP, although it seemed logical that there should be. I did a quick check of the php site, and it looks like it's on its way - see http://ie2.php.net/manual/en/ref.unicode.php. This is great news - it shoudl make this standard across many facets of the web.




0

Featured Post

Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Matthew
I am a very big proponent of technology compliance standards and strive to meet such criteria in all of my work. That includes my site, which is 100% XHTML 1.0 compliant as determined by the World Wide Web Consortium. https://www.matthewstevenkel…
Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL several years ago, it seemed like now was a good time to update it for object-oriented PHP.  This article does that, replacing as much as possible the pr…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question