Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Chinese PRC Collation

Posted on 2008-11-04
2
Medium Priority
?
2,222 Views
Last Modified: 2008-11-16
I need my SQL Server 2005 to support Chinese characters. Under the setup, I could see serveral collation :

Chinese_PRC_Stroke
Chinese_PRC_90
Chinese_PRC

What is the difference between the collation ? What is the difference between var and nvar ? Should I use nvar to include double bytes chars ??

Many thanks
0
Comment
Question by:AXISHK
2 Comments
 
LVL 12

Accepted Solution

by:
Dimitris earned 1400 total points
ID: 22875409
For the collation (because i have came up to a similar situation with CHINESE encoding I suggest that you should consult your customers in order to inform you for the collect collation.
If you chose a wrong collation don,t worry
You can with script change the collation in all tables in your database
The collation will "play" a significant role in your sorting, If you chose the wrong collation you may see the ORDER BY option not working correctly, actually it works correctly but depending on your collation the ORDER BY will short based on two bytes or one byte per character.

I thing that nvarchar (generally all character fields that starts with "n") are Unicode. So in order to be sure that you are able to store all kind of data use "N" chars. Keep in mind that a VARCHAR may have
up to 8000 characters but a NVARCHAR supports up to 4000. Two bytes per character.
Below is help from microsoft

(Microsoft Help)
The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.

One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

Each Microsoft® SQL Server" collation has a code page that defines what patterns of bits represent each character in char, varchar, and text values. Individual columns and character constants can be assigned a different code page. Client computers use the code page associated with the operating system locale to interpret character bit patterns. There are many different code pages. Some characters appear on some code pages, but not on others. Some characters are defined with one bit pattern on some code pages, and with a different bit pattern on other code pages. When you build international systems that must handle different languages, it becomes difficult to pick code pages for all the computers that meet the language requirements of multiple countries. It is also difficult to ensure that every computer performs the correct translations when interfacing with a system using a different code page.

The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system.

In Microsoft SQL Server, these data types support Unicode data:

nchar


nvarchar


ntext


Note  The n prefix for these data types comes from the SQL-92 standard for National (Unicode) data types.

Use of nchar, nvarchar, and ntext is the same as char, varchar, and text, respectively, except that:

Unicode supports a wider range of characters.


More space is needed to store Unicode characters.


The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar.


Unicode constants are specified with a leading N: N'A Unicode string'.


All Unicode data uses the same Unicode code page. Collations do not control the code page used for Unicode columns, only attributes such as comparison rules and case sensitivity.
0
 
LVL 31

Assisted Solution

by:James Murrell
James Murrell earned 600 total points
ID: 22958264
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
Microsoft Access has a limit of 255 columns in a single table; SQL Server allows tables with over 255 columns, but reading that data is not necessarily simple.  The final solution for this task involved creating a custom text parser and then reading…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Suggested Courses

578 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question