Chinese characters - Internationalization

Currently we have Norwegian and English(UK) customers. We are going international and need support for all languages. What is the best way to do this? I am currently doing testing with Chinese.

I have tried using UTF-8 in asp and html. Chinese characters are displayed, but when I try to update the database the characters are converted to ???. If i save directly in enterprise manager, it works.

According to:
Unicode values must have N in front of the string. I.e UPDATE Table SET Field = N'value';
That means we have to change ALL sql queries in our application. That's a lot of changes! There must be some other way.
Using N works though!

But according to:
Using N resuls in poor performance.

What are my alternatives, what is the best thing to do?

1. Use N.

2. Set up new server for each new language. Do I still have to use N for chinese characters?

I know that when using Norwegian/Danish collation on SQL server (as we do) then the index will not be correct if a user inserts chinese characters. That, with the point of using N results in poor performance, suggest one server for each language.

I really can't believe it is this hard to support different languages. To have one server for each language and use N in SQL queries seems ridicolous.

According to:
When web-application and database both use Unicode, then conversion should be handled automatically. This is the case for us, but why isn't it working? Could it be because I have a norwegian keyboard and all characters have been copied/pasted from chinese sites.

Database: Microsoft SQL Server 2000. Set up with Norwegian/Danish collation.
Fields: All text fields are using unicode - nchar/nvarchar/ntext

Application: ASP/VbScript/COM with 65001 codepage and UTF-8 charset
HTML: UTF-8 charset

      Response.AddHeader "Content-Type","text/html;charset=UTF-8"
      Response.CodePage = 65001
      session.CodePage = 65001
      Response.CharSet = "utf-8"

      <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Who is Participating?
davehilditchConnect With a Mentor Commented:
The N prefix is only used to specify a string is nvarchar inside a query.  If you declared a variable type then you will not need this as presumably you will have the nvarchar variable as a parameter in a stored procedure.  The use of N is for:

select N'this is unicode text'
select 'this is ascii text'

But if you have your variables as parameters and they are nvarchar then you will not to perform the above.  Yes, unicode is going to be slower than regular varchars because 2 bytes are used per character instead of 1, so you have double the amount of data to read and write for your text columns.

When you say ??? is being displayed, is this only in Query Analyzer?  I have seen QA multiple times displaying ??? because the internationalisation options haven't been set for QA but the data is still stored correctly and when it finally reaches the ASP page should display correctly.

You don't need to have a seperate database for each language - at a company i worked for before we had greek, norweigan, english, french, german, russian, and (tested but not used) japanese in the same database.  Worked perfectly fine.  Cant' remember how to get QA to display the characters properly, but there is a way.  Looks pretty cool returning greek from a select statement.  (ok, i'm easily pleased)

Dave Hilditch.
It is always complicated mixup of response encoding, request encoding and SQL codepages. Also installed MDAC can play a role.

Check this for details and known issues:

Unfortunately you'll have to rewrite parts of system.
Sleepyhead_NOAuthor Commented:
I have read that article earlier. We are using Windows Server 2003, so MDAC version is not a problem.

If we set up a new database with chinese collation & codepage and use BIG-5 as charset in HTML, will we still need to use the N prefix? If not, then I think we will go for that option instead.
Sleepyhead_NOAuthor Commented:
Thanks for your comments.

we already use unicode fields (nchar/nvarchar/ntext), i know it takes up more space and thus is slower.

for SELECT this does not seem to be a problem, except I haven't got SELECT WHERE to work yet (tried with and without N prefix), thought not testet much. The problem seems to be mainly with INSERT/UPDATE's where I have to specify N'value' to store the value correctly. Without N'value' it stores ???.

Like I said, display is no problem. Was you application a web-application and were you storing data from the web-application as well? Not just displaying it? And what about collation? It will be different with for example Norwegian and English.

In stored procedures where we cast variable as nvarchar, then I assume this will not be a problem. But in allmost 100% of the queries we don't use stored procedures, just generate the sql in ASP and execute it with ADO.

QA and Enterprise Manager displays the chinese characters correctly now, not sure what happend earlier...
My application was a survey system, so data was stored from the web page in any of the languages, and displayed in any of the languages.  If you are using dynamic SQL in your insert/updates then you will definitely have to use N to ensure they are treated as unicode.

If on the other hand, you have something like:

create procedure insertSomething(@param1 nvarchar(100))
insert into mytable
select @param1

Then there is no need for N anywhere, unless you are inserting from Query Analzer.

Dave Hilditch.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.