Link to home
Start Free TrialLog in
Avatar of Stefan Lennerbrant
Stefan LennerbrantFlag for Sweden

asked on

Determining details of sort order in a collation

I'm using collation Finnish_Swedish_CI_AS in a database.

I note that (for instance) V and W are considered equal in this sorting, which surprises me (and possibly all other swedish people, at least the computer-firendly ones:-)
Also, I note that the sorting of danish Ö (o-umlaut) is not sorted together with swedish Ö -- etc etc. Also, there is (of course) a specific sort order of all (as an example) accented "A" characters, and I'd like to know exactly which order it is.

I've tested a little by creating a table with all possible character values, and then checked the sorted output.
However, is there any way to exactly determine the specific details of a collation sort order?
Something like "help_collation 'collation_name'" or "SELECT * FROM ::fn_helpcollations_details()" or something similar?
What I'm looking for is a way to explain to people exactly how each character is sorted.

Also, SQL Server is using Windows collation, right? Perhaps one could get this kind of list somewhere from Windows instead?

Thanks,
/Stefan Lennerbrant
Avatar of Reg Bes
Reg Bes
Flag of South Africa image

Hi stefanlennerbrant,

It appears that SQL gets the order from Windows

http://msdn2.microsoft.com/en-us/library/ms144260.aspx

HTH

R.
Avatar of Stefan Lennerbrant

ASKER

Thanks.
This page seems to describe "SQL Collations" in general (compared to "Windows collations"). The SQL Collation is depreciated, aren't they, and kept for backwards compatibility reasons?

The sort order used by me is (I think?) a Windows collation. It's named "Finnish_Swedish_CI_AS" and described is by sp_helpsort as:
Finnish-Swedish, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 60 on Code Page 850 for non-Unicode Data

However, the question still remains: How to get a detail description of the actual sort order used?
Something like a list where it is specified that A is listed before B, I before Í, V and W treated as equal, hyphen "-" is ignored, etc etc

Possibly the hint "SQL Server Sort Order 60 on CP850" above is something to investigate? I'll check into it.
Hasn't anybody ever been curious regarding the actial detail sorting order in a collation setup? :-)

/Stefan Lennerbrant
stefanlennerbrant,


this should di it then

http://msdn2.microsoft.com/en-US/library/ms143515.aspx
Well, that just says things like:
  Distinguishes between accented and unaccented characters. For example, 'a' is not equal to 'ấ'.

I'm interested in _exactly_ which order a, á, à, â are sorted, and whether v and w are sorted in different order or are considered equal, etc etc

Nothing is said about that, or did I miss something?

/Stefan Lennerbrant
ASKER CERTIFIED SOLUTION
Avatar of Reg Bes
Reg Bes
Flag of South Africa image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't find any theoretical explanation (or specification) on why the sort order is as it is, but your example does show it empirically. It's the same type of test that I already did, but built in a "nicer way".
However, I'd rather define the fields as char(2) instead and then insert two copies of each:
  insert into #ChrOrd select @i, char(@i)+'a',char(@i)+'a'
  insert into #ChrOrd select @i, char(@i)+'z',char(@i)+'z'

and then display them as (which I suppose you meant from the beginning)
  select * from #ChrOrd order by 2
  select * from #ChrOrd order by 3

Then you clearly see that v+w are considered equal, etc etc.

Well - now if I don't like the defined sort order? Is there anything I can do about it? (apart from trying to find a collation that sorts things in a "better way" according to my needs)
I'll create a new question for this:-)
stefanlennerbrant,

I think you are lucky MS even admits that accents are diffrent to each other and gives them diffrent sort order, however the cahnce of them getting it right was slim :)