Sybase ASE 15 - charset conversion from latin-1 to UTF-8

Hi

We are running ASE 15.0 (with character set latin-1 on Solaris 10).  We want to convert the charset of database server from latin-1 to UTF-8 to support  Chinese characters.
can anybody help us to list the steps involved in this charset conversion?  We understand we may have to bcp out/in all the tables unless the data stored in tables are only ASCII-7.  Is there any easy method to convert to UTF-8 without bcp in/out?

We have setup Warm standby using Sybase Replication Server. Do we need to change the charset of Replication server too to support UTF-8 along with dataserver? Kindly advise.

thanks
Chan
chandra nAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Joe WoodhousePrincipal ConsultantCommented:
This is actually a fairly major migration. Because you already have data in the databases I'm afraid that you do have to bcp out and back in again. You'll need to use the "-Jutf8" option to specify the data should be extracted & loaded into UTF8 format.

You might run into issues with data not fitting into their columns any more, because UTF takes more space.

I actually think your best plan here is to create a new ASE with your desired character set, create new databases, create new objects (this time defining all char columns as unichar and varchar as univarchar). Only then bcp out of the original and into the new. Don't forget dbcc fix_text for any tables with text columns.

This way you guarantee your original data is safe, and that all data will fit. Although you do need hardware and storage for a second ASE.

The System Administration Guide volume 1 (chapter 9 in the manual for ASE 15.0) documents the procedure you need exactly.

I'm afraid I have more bad news for you. Your Rep Server character set and sort order must match that of the ASE. And since you're using Warm Standby you will obviously have to change the character set in the standby ASE also, although at that point you'll be able to just dump & load from the active to the standby (don't forget to disable rep agents again).

It really is quite a major migration I'm sorry to say.
chandra nAuthor Commented:
thanks Joe. I really appreciate your explanation.
Luckily we have only 5 tables where a new column to be added to store Chinese characters ( datatype should be univarchar()? ) .
Since only the new columns to be added to the existing tables would the following steps work?

1. Change the charset of existing database server to UTF-8 using charset -Usa -P binary.srt utf8
2. Change default charset to UTF-8
3. Shutdown and restart server
4. Add a new columns with univarchar() into those 5 tables

Do I still need to bcp out/bcp in data from these tables with -Jutf-8 option?  Do you still recommend to setup new server with UTF-8 and dump from old server and load into UTF-8 server?  

 For replication server :

1. Quiesce the queue
2. configure the replication server with UTF-8 charset
3. restart the replication server

Kindly the review the steps and advise.

thanks
CS
Joe WoodhousePrincipal ConsultantCommented:
Let's take a step backwards. If all you want is to add columns to five tables then I am not sure that you need to change your default character set.

You can install a new character set without making it the default. ASE can handle data in multiple character sets. I think for what you're describing that maybe all you need is to install utf8 character set and then add your columns.

Yes the new columns should be univarchar.

This also gives you an "in-place" migration method:

- Install utf8 character set. Do not make it the default.
- Add your new columns
- Convert all existing columns from char to unichar and from varchar to univarchar. You can do this on multiple tables concurrently for performance.
- Once everything is converted, change the default character set and reboot ASE.

No bcp out/in required if you do it this way. Unfortunately I think you'll lose replication and Warm Standby for the exercise, but once complete just change standby ASE's default character set and then treat it like rebuilding Warm Standby.

Re. the RS tasks it's a bit more involved than that. You will effectively be rebuilding your Warm Standby. The Replication Server Design Guide has a section devoted to this under "International Replication Design Considerations" where it walks you through every step.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Put Your Flow Data to Work

SolarWinds® Flow Tool Bundle combines three easy-to-download, easy-to-use flow analysis tools that can help you quickly distribute, test, and configure your flow traffic.

chandra nAuthor Commented:
thanks for the clarification Joe.  

Only the new columns added to the tables are used to store Chinese text. I understand from your reply that we still need to convert all the existing char & varchar columns (of all the tables?) to unichar and univarchar. I hope it can be done using alter table command and data conversion to UTF8 format will be automatically done. It might  increase the length of the row so we should also verify the row size.

Can we store the data on only those 5 tables in UTF8 format and keep the data of other tables in existing charset format?

Kindly confirm.

Regards
CS
Joe WoodhousePrincipal ConsultantCommented:
Yes, you don't have to convert existing data.

Simply add the new character set to ASE but don't change the default character set. Leave all other data as is, and only add/convert those new columns.

RepServer should be able to deal with this because generally the data is implicitly convertible but I strongly suggest you test this scenario before committing to it in Production: two ASEs and a RepServer all using your current character set, but replicating some univarchar data containing Chinese characters. I think it will work as the documentation says we only want the default character sets to match and that RS is smart enough to convert if needed... my only question is whether the Chinese characters survive the double conversion.
chandra nAuthor Commented:
thanks for your help help Joe.

Regards
Chandra
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.