asked on

Sybase ASE 15 - charset conversion from latin-1 to UTF-8

Hi

We are running ASE 15.0 (with character set latin-1 on Solaris 10). We want to convert the charset of database server from latin-1 to UTF-8 to support Chinese characters.
can anybody help us to list the steps involved in this charset conversion? We understand we may have to bcp out/in all the tables unless the data stored in tables are only ASCII-7. Is there any easy method to convert to UTF-8 without bcp in/out?

We have setup Warm standby using Sybase Replication Server. Do we need to change the charset of Replication server too to support UTF-8 along with dataserver? Kindly advise.

thanks
Chan

Joe Woodhouse

This is actually a fairly major migration. Because you already have data in the databases I'm afraid that you do have to bcp out and back in again. You'll need to use the "-Jutf8" option to specify the data should be extracted & loaded into UTF8 format.

You might run into issues with data not fitting into their columns any more, because UTF takes more space.

I actually think your best plan here is to create a new ASE with your desired character set, create new databases, create new objects (this time defining all char columns as unichar and varchar as univarchar). Only then bcp out of the original and into the new. Don't forget dbcc fix_text for any tables with text columns.

This way you guarantee your original data is safe, and that all data will fit. Although you do need hardware and storage for a second ASE.

The System Administration Guide volume 1 (chapter 9 in the manual for ASE 15.0) documents the procedure you need exactly.

I'm afraid I have more bad news for you. Your Rep Server character set and sort order must match that of the ASE. And since you're using Warm Standby you will obviously have to change the character set in the standby ASE also, although at that point you'll be able to just dump & load from the active to the standby (don't forget to disable rep agents again).

It really is quite a major migration I'm sorry to say.

chandra n

ASKER

thanks Joe. I really appreciate your explanation.
Luckily we have only 5 tables where a new column to be added to store Chinese characters ( datatype should be univarchar()? ) .
Since only the new columns to be added to the existing tables would the following steps work?

1. Change the charset of existing database server to UTF-8 using charset -Usa -P binary.srt utf8
2. Change default charset to UTF-8
3. Shutdown and restart server
4. Add a new columns with univarchar() into those 5 tables

Do I still need to bcp out/bcp in data from these tables with -Jutf-8 option? Do you still recommend to setup new server with UTF-8 and dump from old server and load into UTF-8 server?

For replication server :

1. Quiesce the queue
2. configure the replication server with UTF-8 charset
3. restart the replication server

Kindly the review the steps and advise.

thanks
CS

ASKER CERTIFIED SOLUTION

Joe Woodhouse

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

chandra n

ASKER

thanks for the clarification Joe.

Only the new columns added to the tables are used to store Chinese text. I understand from your reply that we still need to convert all the existing char & varchar columns (of all the tables?) to unichar and univarchar. I hope it can be done using alter table command and data conversion to UTF8 format will be automatically done. It might increase the length of the row so we should also verify the row size.

Can we store the data on only those 5 tables in UTF8 format and keep the data of other tables in existing charset format?

Kindly confirm.

Regards
CS

Joe Woodhouse

Yes, you don't have to convert existing data.

Simply add the new character set to ASE but don't change the default character set. Leave all other data as is, and only add/convert those new columns.

RepServer should be able to deal with this because generally the data is implicitly convertible but I strongly suggest you test this scenario before committing to it in Production: two ASEs and a RepServer all using your current character set, but replicating some univarchar data containing Chinese characters. I think it will work as the documentation says we only want the default character sets to match and that RS is smart enough to convert if needed... my only question is whether the Chinese characters survive the double conversion.

chandra n

ASKER

thanks for your help help Joe.

Regards
Chandra