Sybase ASE 15 - charset conversion from latin-1 to UTF-8

chandra n
chandra n used Ask the Experts™
on
Hi

We are running ASE 15.0 (with character set latin-1 on Solaris 10).  We want to convert the charset of database server from latin-1 to UTF-8 to support  Chinese characters.
can anybody help us to list the steps involved in this charset conversion?  We understand we may have to bcp out/in all the tables unless the data stored in tables are only ASCII-7.  Is there any easy method to convert to UTF-8 without bcp in/out?

We have setup Warm standby using Sybase Replication Server. Do we need to change the charset of Replication server too to support UTF-8 along with dataserver? Kindly advise.

thanks
Chan
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Joe WoodhousePrincipal Consultant
Most Valuable Expert 2012

Commented:
This is actually a fairly major migration. Because you already have data in the databases I'm afraid that you do have to bcp out and back in again. You'll need to use the "-Jutf8" option to specify the data should be extracted & loaded into UTF8 format.

You might run into issues with data not fitting into their columns any more, because UTF takes more space.

I actually think your best plan here is to create a new ASE with your desired character set, create new databases, create new objects (this time defining all char columns as unichar and varchar as univarchar). Only then bcp out of the original and into the new. Don't forget dbcc fix_text for any tables with text columns.

This way you guarantee your original data is safe, and that all data will fit. Although you do need hardware and storage for a second ASE.

The System Administration Guide volume 1 (chapter 9 in the manual for ASE 15.0) documents the procedure you need exactly.

I'm afraid I have more bad news for you. Your Rep Server character set and sort order must match that of the ASE. And since you're using Warm Standby you will obviously have to change the character set in the standby ASE also, although at that point you'll be able to just dump & load from the active to the standby (don't forget to disable rep agents again).

It really is quite a major migration I'm sorry to say.

Author

Commented:
thanks Joe. I really appreciate your explanation.
Luckily we have only 5 tables where a new column to be added to store Chinese characters ( datatype should be univarchar()? ) .
Since only the new columns to be added to the existing tables would the following steps work?

1. Change the charset of existing database server to UTF-8 using charset -Usa -P binary.srt utf8
2. Change default charset to UTF-8
3. Shutdown and restart server
4. Add a new columns with univarchar() into those 5 tables

Do I still need to bcp out/bcp in data from these tables with -Jutf-8 option?  Do you still recommend to setup new server with UTF-8 and dump from old server and load into UTF-8 server?  

 For replication server :

1. Quiesce the queue
2. configure the replication server with UTF-8 charset
3. restart the replication server

Kindly the review the steps and advise.

thanks
CS
Principal Consultant
Most Valuable Expert 2012
Commented:
Let's take a step backwards. If all you want is to add columns to five tables then I am not sure that you need to change your default character set.

You can install a new character set without making it the default. ASE can handle data in multiple character sets. I think for what you're describing that maybe all you need is to install utf8 character set and then add your columns.

Yes the new columns should be univarchar.

This also gives you an "in-place" migration method:

- Install utf8 character set. Do not make it the default.
- Add your new columns
- Convert all existing columns from char to unichar and from varchar to univarchar. You can do this on multiple tables concurrently for performance.
- Once everything is converted, change the default character set and reboot ASE.

No bcp out/in required if you do it this way. Unfortunately I think you'll lose replication and Warm Standby for the exercise, but once complete just change standby ASE's default character set and then treat it like rebuilding Warm Standby.

Re. the RS tasks it's a bit more involved than that. You will effectively be rebuilding your Warm Standby. The Replication Server Design Guide has a section devoted to this under "International Replication Design Considerations" where it walks you through every step.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
thanks for the clarification Joe.  

Only the new columns added to the tables are used to store Chinese text. I understand from your reply that we still need to convert all the existing char & varchar columns (of all the tables?) to unichar and univarchar. I hope it can be done using alter table command and data conversion to UTF8 format will be automatically done. It might  increase the length of the row so we should also verify the row size.

Can we store the data on only those 5 tables in UTF8 format and keep the data of other tables in existing charset format?

Kindly confirm.

Regards
CS
Joe WoodhousePrincipal Consultant
Most Valuable Expert 2012

Commented:
Yes, you don't have to convert existing data.

Simply add the new character set to ASE but don't change the default character set. Leave all other data as is, and only add/convert those new columns.

RepServer should be able to deal with this because generally the data is implicitly convertible but I strongly suggest you test this scenario before committing to it in Production: two ASEs and a RepServer all using your current character set, but replicating some univarchar data containing Chinese characters. I think it will work as the documentation says we only want the default character sets to match and that RS is smart enough to convert if needed... my only question is whether the Chinese characters survive the double conversion.

Author

Commented:
thanks for your help help Joe.

Regards
Chandra

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial