Link to home
Start Free TrialLog in
Avatar of Indraneel
Indraneel

asked on

Charater set Problem

We develop client server application using VB & Oracle. We are using Novell oracle 8i 8.1.5.0 at Our office which we have bought with Novell 5 and one of our client has bought Oracle 8.3.0.6 with Novell. We are using ISM GIST from CDAC for developing software in reginol langauge. Where some extended ASCII charcters are used. The problem at my client site is that some character are changing when saved to database and retreived.
Please suggest Solution.
Avatar of schwertner
schwertner
Flag of Antarctica image

I think you need to set the proper NLS character set in your DB. The problem is that this set is set only once by creating of the DB. Altering after that is a risky action. I took for you the answers from similar question on the Oracle thread and hope this will make you oriented in the problem.


The following is from documentation.  You have to
set proper character set while creating database.


Documentation:

Creating the Database with a Character Set:

The database character encoding scheme is specified by the CHARACTER SET clause of the CREATE DATABASE
statement. The client character encoding scheme is specified by the NLS_LANG parameter. Each user session
can, if required, specify a different encoding scheme. When a client?s character set is different from
the database character set, data conversion is done automatically, and it is transparent to the client
application.

The specification of CHARACTER SET in CREATE DATABASE is an important consideration when creating a
database, since a database character set cannot be changed without re-creating the database. If a number
of clients are to access a database using different character sets, the database character set chosen
should be equivalent to or a superset of all possible client character sets.

Character Conversion

Character set conversion is often necessary in a client/server computing environment where a client
application may be running in a different character set than that of the server. In such cases, client/server
character conversion occurs automatically and transparent to the application.

Conversion is possible between any two character sets and can also be invoked explicitly. In Oracle8,
conversion between different character sets is implemented using Unicode codepoints as an intermediate
form. The source character is first converted to a Unicode codepoint. The Unicode codepoint is converted
to the character in the destination character set. In cases where a target character set does not contain
all characters in the source data, replacement characters are used.

Unicode a Universal Character Set

A character set may support a specific language, a group of related languages, or attempt to encompass
all known languages. The Unicode Standard defines a character repertoire that encompasses all major
scripts of the world, as well as technical symbols in common use. Unicode Version 2.0 contains 38,885
characters from the world?s scripts.

The Unicode character repertoire can be represented in a number of different encoding formats. UCS-2
is a two-byte fixed-width format, UTF-8 is a multi-byte format with variable width. Oracle8 supports
the UTF-8 format only. UTF-8 is an ASCII-compatible encoding scheme. The Oracle character set name for
Unicode 2.0 is "UTF8". Unicode 1.1 has been supported with the Oracle character set name of "AL24UTFFSS"
since Oracle7?.

A typical scenario in which the Unicode character set would be useful is a multinational company with
local databases throughout the world that may want to consolidate data into one central database. In
this case, UTF8 would be an ideal character set to use as a server character set since it is a superset
of all other character sets.

By default it is not UTF8, Western European ISO ...

There are 2 different sets of languages - normal english, French and other characters are single bytes.
Other language characters (chenese etc.) are multibytes.

If you want to store those multibyte characters in your database, you have to define character set as
UTF8 or UTF16 while creating database. Otherwise it can not convert/ store exeisting single byte data
into multibyte.

There is another parameter (system and session specific)NLS_LANG is for how to display which you need
to maintain in your java code. You can set it on browser and display in different format no matter how
it is stored in the database. If you store in UTF8 or UTF16, you can view in any language.

Character set conversion is often necessary in a client/server computing environment where a client
application may reside on a different computer platform from that of the server, and both platforms
may not use the same character encoding schemes. Character data passed between client and server must
be converted between the two encoding schemes. Character conversion occurs automatically and transparently
via Net8.
In order to change unpropriate database character set if it is not 7ASCII you have to recreate database
as follow( or simular)
CREATE DATABASE ques
   LOGFILE 'D:\ora8v2\oradata\ques\redo01.log' SIZE 1024K,
           'D:\ora8v2\oradata\ques\redo02.log' SIZE 1024K,
           'D:\ora8v2\oradata\ques\redo03.log' SIZE 1024K
       MAXLOGFILES 32
       MAXLOGMEMBERS 2
       MAXLOGHISTORY 1
     DATAFILE 'D:\ora8v2\oradata\ques\system01.dbf' SIZE 98M  REUSE AUTOEXTEND ON NEXT 640K
     MAXDATAFILES 254
    MAXINSTANCES 1
    CHARACTER SET UTF8
    NATIONAL CHARACTER SET CL8MSWIN1251;
If initial character set is 7ASCII
you can use
Alter database set character set UTF8 ...
because UTF8 is superset of 7ASCII.

There is a way to change the character set of a database, but it's a somewhat high risk operation.  
Shut the database down.  Then do a startup mount exclusive restrict.  From there, you can issue an alter
database character internal_use UTF8;  This should work.  But (and I cannot emphasize this enough),
you *must* backup your database prior to doing this -- and a cold backup at that.  

Oracle support can probably give you more information about this particular to your situation.  

If you have any possibility to recreate the DB than I would recreate it with utf-8 character set.
I think that jdbc anyway handles the characters in the utf-8 format so you don't need any further conversion.
Avatar of madappattu
madappattu

Hello Indraneel,
I have tried the second methord you mentioned (The high risky option). and got the following errors.

---
SVRMGR> alter database character SET internal_use UTF8;
alter database character SET internal_use UTF8
                                          *
ORA-12710: new character set must be a superset of old character set
---

Does this means I have not selected ASCII? if so how to find out what char set I used at the time of installation

Sorry the reply was for "schwertner"

No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:
 - PAQ'd and pts removed
Please leave any comments here within the
next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER !

Nic;o)
ASKER CERTIFIED SOLUTION
Avatar of Jgould
Jgould

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial