?
Solved

Multi-byte character data (Japanese) issue - SQL*PLUS spool file on Linux environment

Posted on 2011-10-31
4
Medium Priority
?
2,693 Views
Last Modified: 2012-05-12
I'm spooling oracle data with NLS_LANG parameter for database as AMERICAN_AMERICA.AL32UTF8 onto Linux environment. The CHARSET for Linux file is UTF-8. The Japanese characters in the file do not display data correctly, and it does not convert back to proper data when pulled back in Oracle or in any other tool that can display those characters.
Is it essential for ORACLE NLS_LANG to be set as UTF-8 as well in sync with LINUX environment variable? Are AL32UTF8 and UTF-8 not compatible with each other when it comes to dealing with Japanese characters?
Does my target column with NVARCHAR datatype should have size as double the size of source column with VARCHAR2 datatype?
0
Comment
Question by:aby_1983
  • 2
4 Comments
 
LVL 38

Expert Comment

by:yuzh
ID: 37062005
You need to set NLS_LANG  to Japanese_Japan.UTF8

also have a look at the following page:
http://itcareershift.com/blog1/2011/02/04/oracle-character-set-everything-a-new-oracle-dba-needs-to-know/

0
 
LVL 6

Accepted Solution

by:
Greg Clough earned 750 total points
ID: 37067321
UTF-8 should be compatible across the board.  Depending on your version of Oracle, you will either use the AL32UTF8 characterset, or possibly the old UTF8.  Check here for a matrix:

http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch6unicode.htm#g1008165

I will presume that your database is running in AL32UTF8 character set, or is it just your "NATIONAL CHARACTER" set that is AL32UTF8?  The general push these days is to stop using NVARCHAR, etc. and just make the entire database AL32UTF8, as you'll have less headaches in the long run... and storage is cheap.

Here is a quick rule that you should always follow (unless you know the very odd and specific times when you shouldn't):

Always set the last part of NLS_LANG to match the character set of the CLIENT machine, not the database

First, let's confirm that the data is actually stored in the database correctly in UTF-8 format.  You can use the dump() function, and this UTF-8 lookup table.

One common problem is setting the client and database charactersets to match, even though the client may be running something different.  This bypasses ALL of the Oracle character translation code, and means you can store invalid characters in your database as Oracle simply passes the bytes back and forth without modification.  The problem is compounded by the fact that everything "Looks OK" so long as you only ever access the database from those same mis-configured clients.  It's only when you try to introduce a different client that things get messy.

For example.

Database: AL32UTF8
Client: JA16SJIS... but configured with an NLS_LANG=JAPANESE_JAPAN.AL32UTF8

Then you try to dump your database to a 3rd client (in this case Linux), and it thinks it's dumping AL32UTF8, but in fact the file (and database) will include JA16SJIS bytes.  If you determine that you are in this situation, then log an SR with Oracle as this is very difficult to cleanup.

0
 
LVL 6

Expert Comment

by:Greg Clough
ID: 37067374
Sorry, I forgot to answer question 2 and 3:

2. Is Japanese the same in UTF8 and AL32UTF8?

I used to think so, but the content in the link provided by the previous poster has be doubting that...

  UTF8 was the UTF-8 encoded character set in Oracle 8 and 8i. To maintain compatibility with existing installations

   this character set will remain at unicode version 3.0 in future oracle releases.

   Specific supplementary characters were not assigned to unicode until Unicode version 3.1. Hence the supplementary

   characters(chinese,japanese,korean,gothic,old italic,mathematical,musical) could come with a inverted question mark(?).

3. Does my NVARCHAR need to be double?

UTF-8 is a variable-byte character set, so it can be 1, 2, or 3 bytes, depending on the character.  USASCII < 128 will be 1 byte, most European lanaguanges are 2 bytes.  Japanese will be 3 for each character I believe:

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=12160&number=1024&unicodeinhtml=hex

So you will need 3 bytes for every Japanese character... but you can always use the "CHAR" definition instead of the default "BYTES"

http://download.oracle.com/docs/cd/E11882_01/server.112/e10729/ch2charset.htm#autoId13
0
 

Author Closing Comment

by:aby_1983
ID: 37067971
AL32UTF8 (oracle charset) and UTF-8 (linux file charset) are compatible. Linux (secure shell) falls short of displaying Japanese characters properly. Toad 10.6 helped as it displays Japanese characters properly instead of question marks. Linux file loaded to target database eventually returns valid data.

Thanks for your inputs to help me stay put in right direction.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.
Suggested Courses

569 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question