Solved

Chinese characters - Internationalization

Posted on 2004-09-21
8
1,166 Views
Last Modified: 2009-01-07
Currently we have Norwegian and English(UK) customers. We are going international and need support for all languages. What is the best way to do this? I am currently doing testing with Chinese.

I have tried using UTF-8 in asp and html. Chinese characters are displayed, but when I try to update the database the characters are converted to ???. If i save directly in enterprise manager, it works.

According to: http://seba.studentenweb.org/thesis/db-sql.php
Unicode values must have N in front of the string. I.e UPDATE Table SET Field = N'value';
That means we have to change ALL sql queries in our application. That's a lot of changes! There must be some other way.
Using N works though!

But according to: http://www.sql-server-performance.com/datatypes.asp
Using N resuls in poor performance.

What are my alternatives, what is the best thing to do?

1. Use N.

2. Set up new server for each new language. Do I still have to use N for chinese characters?

I know that when using Norwegian/Danish collation on SQL server (as we do) then the index will not be correct if a user inserts chinese characters. That, with the point of using N results in poor performance, suggest one server for each language.

I really can't believe it is this hard to support different languages. To have one server for each language and use N in SQL queries seems ridicolous.

According to: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_dataencoding.asp
When web-application and database both use Unicode, then conversion should be handled automatically. This is the case for us, but why isn't it working? Could it be because I have a norwegian keyboard and all characters have been copied/pasted from chinese sites.


Database: Microsoft SQL Server 2000. Set up with Norwegian/Danish collation.
Fields: All text fields are using unicode - nchar/nvarchar/ntext

Application: ASP/VbScript/COM with 65001 codepage and UTF-8 charset
HTML: UTF-8 charset

Using:
      Response.AddHeader "Content-Type","text/html;charset=UTF-8"
      Response.CodePage = 65001
      session.CodePage = 65001
      Response.CharSet = "utf-8"

And:
      <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
0
Comment
Question by:Sleepyhead_NO
  • 2
  • 2
8 Comments
 
LVL 12

Expert Comment

by:patrikt
ID: 12112930
It is always complicated mixup of response encoding, request encoding and SQL codepages. Also installed MDAC can play a role.

Check this for details and known issues:
http://msdn.microsoft.com/msdnmag/issues/01/05/global/default.aspx

Unfortunately you'll have to rewrite parts of system.
0
 
LVL 3

Author Comment

by:Sleepyhead_NO
ID: 12113812
I have read that article earlier. We are using Windows Server 2003, so MDAC version is not a problem.

If we set up a new database with chinese collation & codepage and use BIG-5 as charset in HTML, will we still need to use the N prefix? If not, then I think we will go for that option instead.
0
 
LVL 4

Accepted Solution

by:
davehilditch earned 200 total points
ID: 12114323
The N prefix is only used to specify a string is nvarchar inside a query.  If you declared a variable type then you will not need this as presumably you will have the nvarchar variable as a parameter in a stored procedure.  The use of N is for:

select N'this is unicode text'
select 'this is ascii text'

But if you have your variables as parameters and they are nvarchar then you will not to perform the above.  Yes, unicode is going to be slower than regular varchars because 2 bytes are used per character instead of 1, so you have double the amount of data to read and write for your text columns.

When you say ??? is being displayed, is this only in Query Analyzer?  I have seen QA multiple times displaying ??? because the internationalisation options haven't been set for QA but the data is still stored correctly and when it finally reaches the ASP page should display correctly.

You don't need to have a seperate database for each language - at a company i worked for before we had greek, norweigan, english, french, german, russian, and (tested but not used) japanese in the same database.  Worked perfectly fine.  Cant' remember how to get QA to display the characters properly, but there is a way.  Looks pretty cool returning greek from a select statement.  (ok, i'm easily pleased)

Dave Hilditch.
0
 
LVL 3

Author Comment

by:Sleepyhead_NO
ID: 12115559
Thanks for your comments.

we already use unicode fields (nchar/nvarchar/ntext), i know it takes up more space and thus is slower.

for SELECT this does not seem to be a problem, except I haven't got SELECT WHERE to work yet (tried with and without N prefix), thought not testet much. The problem seems to be mainly with INSERT/UPDATE's where I have to specify N'value' to store the value correctly. Without N'value' it stores ???.

Like I said, display is no problem. Was you application a web-application and were you storing data from the web-application as well? Not just displaying it? And what about collation? It will be different with for example Norwegian and English.

In stored procedures where we cast variable as nvarchar, then I assume this will not be a problem. But in allmost 100% of the queries we don't use stored procedures, just generate the sql in ASP and execute it with ADO.

QA and Enterprise Manager displays the chinese characters correctly now, not sure what happend earlier...
0
 
LVL 4

Expert Comment

by:davehilditch
ID: 12121691
My application was a survey system, so data was stored from the web page in any of the languages, and displayed in any of the languages.  If you are using dynamic SQL in your insert/updates then you will definitely have to use N to ensure they are treated as unicode.

If on the other hand, you have something like:

create procedure insertSomething(@param1 nvarchar(100))
as
insert into mytable
select @param1

Then there is no need for N anywhere, unless you are inserting from Query Analzer.

Dave Hilditch.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Introduced in Microsoft SQL Server 2005, the Copy Database Wizard (http://msdn.microsoft.com/en-us/library/ms188664.aspx) is useful in copying databases and associated objects between SQL instances; therefore, it is a good migration and upgrade tool…
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now