Link to home
Start Free TrialLog in
Avatar of aby_1983
aby_1983

asked on

Multi-byte character data (Japanese) issue - SQL*PLUS spool file on Linux environment

I'm spooling oracle data with NLS_LANG parameter for database as AMERICAN_AMERICA.AL32UTF8 onto Linux environment. The CHARSET for Linux file is UTF-8. The Japanese characters in the file do not display data correctly, and it does not convert back to proper data when pulled back in Oracle or in any other tool that can display those characters.
Is it essential for ORACLE NLS_LANG to be set as UTF-8 as well in sync with LINUX environment variable? Are AL32UTF8 and UTF-8 not compatible with each other when it comes to dealing with Japanese characters?
Does my target column with NVARCHAR datatype should have size as double the size of source column with VARCHAR2 datatype?
Avatar of yuzh
yuzh

You need to set NLS_LANG  to Japanese_Japan.UTF8

also have a look at the following page:
http://itcareershift.com/blog1/2011/02/04/oracle-character-set-everything-a-new-oracle-dba-needs-to-know/

ASKER CERTIFIED SOLUTION
Avatar of Greg Clough
Greg Clough
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry, I forgot to answer question 2 and 3:

2. Is Japanese the same in UTF8 and AL32UTF8?

I used to think so, but the content in the link provided by the previous poster has be doubting that...

  UTF8 was the UTF-8 encoded character set in Oracle 8 and 8i. To maintain compatibility with existing installations

   this character set will remain at unicode version 3.0 in future oracle releases.

   Specific supplementary characters were not assigned to unicode until Unicode version 3.1. Hence the supplementary

   characters(chinese,japanese,korean,gothic,old italic,mathematical,musical) could come with a inverted question mark(?).

3. Does my NVARCHAR need to be double?

UTF-8 is a variable-byte character set, so it can be 1, 2, or 3 bytes, depending on the character.  USASCII < 128 will be 1 byte, most European lanaguanges are 2 bytes.  Japanese will be 3 for each character I believe:

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=12160&number=1024&unicodeinhtml=hex

So you will need 3 bytes for every Japanese character... but you can always use the "CHAR" definition instead of the default "BYTES"

http://download.oracle.com/docs/cd/E11882_01/server.112/e10729/ch2charset.htm#autoId13
Avatar of aby_1983

ASKER

AL32UTF8 (oracle charset) and UTF-8 (linux file charset) are compatible. Linux (secure shell) falls short of displaying Japanese characters properly. Toad 10.6 helped as it displays Japanese characters properly instead of question marks. Linux file loaded to target database eventually returns valid data.

Thanks for your inputs to help me stay put in right direction.