Solved

Collation for a new MS SQL Server

Posted on 2013-01-09
7
1,212 Views
Last Modified: 2013-01-12
Hi,
We're about to install a new MS SQL Server which would hold all our new DW and OLAP databases for the company.
We intend to insert into it data from legacy DBs, which, among other things, hold URLs with unicode characters.
In general we would like to be able to support many kinds of non-English languages, but we're not sure which collation to choose for the server setup and for the instances.
Our main candidates are:
1) utf8_general_ci - which would have been the best option maybe, but it seems that you cannot assign this collation at server-level! Problem is, if we would define it at instance level, what do we assign to the server-level, and what is the affect of having them not the same?
2) Latin1_General_CI_AS - the fallback of the above maybe, since it can be defined at both the server and instance levels, but what do we do with our unicode requirements?
3) Finnish_Swedish_CI_AS - since we have some swedish and finnish sites, some people refer to this collation is maybe better against the "latin" one above.

So the end question is: what collation should we choose at server-level and on instance-level?
Thank you in advance!!!
0
Comment
Question by:webpals
7 Comments
 
LVL 21

Expert Comment

by:Alpesh Patel
ID: 38761791
Pls use SQL_General_Latin1_CI_AS  it's best option for all languages
0
 
LVL 20

Assisted Solution

by:Marten Rune
Marten Rune earned 100 total points
ID: 38761831
Youre in trouble now.

If a legacy product demands finnish_swedish_CI_AS, and another SQL_General_Latin_CI_AS chances are things wont work, and you'll be running on unsupported configuration for one of the legacy products.

This is because bad coding matches var, nvar or similar columns with a operation in TempDB. Now TempDB will have the same collation as the server collation, not the database collation.

SQL Server itself is designed to allow any and all collations, and you should be able to support both legacy products if coders could only remember one small phrace when they create a tempory table, and that is "collate database_default" if they would only use this magic phrace the SQL could support it all on one instance.

Look at:
http://blog.sqlauthority.com/2007/06/11/sql-server-cannot-resolve-collation-conflict-for-equal-to-operation/
and
http://msdn.microsoft.com/en-us/library/aa258237(v=sql.80).aspx
or
http://www.sqlservercentral.com/Forums/Topic813941-338-1.aspx

It's been around since SQL2000. So you'd expect the programmers of legacy product to pick up on it, for like a decade ago. But this looks like it will sadly never happen.

There is always a new rookie programmer in every legacy products coding teem it seems.

Your solution will likely be the same as many of us out there, yet another instance of SQL, on the same host (for licensing reason), and you'll miss out on resource govenor (wont work between instances) and have a hard time figuring out wich instance is smuthering the others when it comes to performance tuning.

Regards Marten
0
 

Author Comment

by:webpals
ID: 38761879
Dear Marten,
Thank you for the long answer and details.

What I'm not yet clear about is: since the server would be running all its TempDB tables using the server collation SQL_General_Latin_CI_AS, do you recommend to:
1) Use the same collation for the DB instances too so to have no collisions. Thus the question is: how do we save and use our needed "unicode" data?
or
2) Use a different collation for the DB instance, like utf8_general_ci

Second, why is the recommendation to use SQL_General_Latin_CI_AS, which is stated at MS as an "old" SQL collation, rather than the "new" General_Latin_CI_AS?

Thanks
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 51

Accepted Solution

by:
Mark Wills earned 400 total points
ID: 38761986
Well, your unicode data is readily handled with NVARCHAR() and NCHAR() instead of VARCHAR

Given the diversity of languages, and the disposition of SQL to install according to the code page (windows collation), I would suggest you chose the easiest to live with in general terms.

If you really do have very specific language dependancies, then maybe you need to consider some databases being swedish (or finnish or whatever).

But it is often considered that collation must be the same as language, and that isnt entirely the case. The collation must be able to support the choice of language and typically the unicode data types will certainly do that...

Also, you need to (maybe) consider the newer collations to support windows collations - they have a 100 in there...

Generally speaking, I wouldnt use SQL collation unless there was some kind of heritage or prior dependancy. I would default to Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive. And for language dependant columns which require unicode then use nvarchar().

Have a read of :
http://msdn.microsoft.com/en-US/library/ms143726(v=SQL.110).aspx

Probably the more important and definitely language dependant aspect are things like Month names and Day names but that is another story altogether...

One last thing to consider is you can really impact (big time) any server performance if you need to use the COLLATE clause in your selects. If the collation needs to convert a sort sequence then there can be a fairly significant penalty.

There was a quizz on Beyond Relational which shows some of the possible impact : http://beyondrelational.com/quiz/sqlserver/general/2010/questions/1/sqlserver-quiz-general-2010-erland-sommarskog-a-collation-puzzle.aspx and a general discussion (with examples)  http://beyondrelational.com/modules/24/syndicated/842/Posts/17777/revised-difference-between-collation-sqllatin1generalcp1cias-and-latin1generalcias.aspx
0
 

Author Comment

by:webpals
ID: 38762844
Mark, thank you!
So I'm now finally convinced that using the non-unicode Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive option, plus using NVARCHAR() and NCHAR() for unicode fields is the best option for us.

Just to confirm what you mean by "Generally speaking, I wouldnt use SQL collation unless there was some kind of heritage or prior dependancy"

I read: http://beyondrelational.com/modules/24/syndicated/842/Posts/17777/revised-difference-between-collation-sqllatin1generalcp1cias-and-latin1generalcias.aspx

And from it's conclusion, it looks like Latin1_General_CI_AS is a better option than the SQL counterparts like: SQL_General_Latin1_CI_AS or SQL_Latin1_General_CP1_CI_AS suggested on this thread.

Is that true?
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 400 total points
ID: 38763645
Yes, it is true.

And if using the newer collations (the _100 to correspond with Win Server 2008) , there is a whitepaper worth reading first : http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/Pre-SQLServer2008_NativeDataProviders_and_NewCollations.docx
0
 

Author Comment

by:webpals
ID: 38771280
Thank you, Mark and thank you Marten. It's clear to me now, we'll go for the new collations (non SQL_ ones) either Latin1_General_CI_AS or the newer ones suggested (the _100) based on the server + client we use.
0

Featured Post

Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

Join & Write a Comment

In this article—a derivative of my DaytaBase.org blog post (http://daytabase.org/2011/06/18/what-week-is-it/)—I will explore a few different perspectives on which week today's date falls within using Microsoft SQL Server. First, to frame this stu…
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now