Multi-byte character data (Japanese) issue - SQL*PLUS spool file on Linux environment

Posted on 2011-10-31
Last Modified: 2012-05-12
I'm spooling oracle data with NLS_LANG parameter for database as AMERICAN_AMERICA.AL32UTF8 onto Linux environment. The CHARSET for Linux file is UTF-8. The Japanese characters in the file do not display data correctly, and it does not convert back to proper data when pulled back in Oracle or in any other tool that can display those characters.
Is it essential for ORACLE NLS_LANG to be set as UTF-8 as well in sync with LINUX environment variable? Are AL32UTF8 and UTF-8 not compatible with each other when it comes to dealing with Japanese characters?
Does my target column with NVARCHAR datatype should have size as double the size of source column with VARCHAR2 datatype?
Question by:aby_1983
    LVL 38

    Expert Comment

    You need to set NLS_LANG  to Japanese_Japan.UTF8

    also have a look at the following page:

    LVL 6

    Accepted Solution

    UTF-8 should be compatible across the board.  Depending on your version of Oracle, you will either use the AL32UTF8 characterset, or possibly the old UTF8.  Check here for a matrix:

    I will presume that your database is running in AL32UTF8 character set, or is it just your "NATIONAL CHARACTER" set that is AL32UTF8?  The general push these days is to stop using NVARCHAR, etc. and just make the entire database AL32UTF8, as you'll have less headaches in the long run... and storage is cheap.

    Here is a quick rule that you should always follow (unless you know the very odd and specific times when you shouldn't):

    Always set the last part of NLS_LANG to match the character set of the CLIENT machine, not the database

    First, let's confirm that the data is actually stored in the database correctly in UTF-8 format.  You can use the dump() function, and this UTF-8 lookup table.

    One common problem is setting the client and database charactersets to match, even though the client may be running something different.  This bypasses ALL of the Oracle character translation code, and means you can store invalid characters in your database as Oracle simply passes the bytes back and forth without modification.  The problem is compounded by the fact that everything "Looks OK" so long as you only ever access the database from those same mis-configured clients.  It's only when you try to introduce a different client that things get messy.

    For example.

    Database: AL32UTF8
    Client: JA16SJIS... but configured with an NLS_LANG=JAPANESE_JAPAN.AL32UTF8

    Then you try to dump your database to a 3rd client (in this case Linux), and it thinks it's dumping AL32UTF8, but in fact the file (and database) will include JA16SJIS bytes.  If you determine that you are in this situation, then log an SR with Oracle as this is very difficult to cleanup.

    LVL 6

    Expert Comment

    by:Greg Clough
    Sorry, I forgot to answer question 2 and 3:

    2. Is Japanese the same in UTF8 and AL32UTF8?

    I used to think so, but the content in the link provided by the previous poster has be doubting that...

      UTF8 was the UTF-8 encoded character set in Oracle 8 and 8i. To maintain compatibility with existing installations

       this character set will remain at unicode version 3.0 in future oracle releases.

       Specific supplementary characters were not assigned to unicode until Unicode version 3.1. Hence the supplementary

       characters(chinese,japanese,korean,gothic,old italic,mathematical,musical) could come with a inverted question mark(?).

    3. Does my NVARCHAR need to be double?

    UTF-8 is a variable-byte character set, so it can be 1, 2, or 3 bytes, depending on the character.  USASCII < 128 will be 1 byte, most European lanaguanges are 2 bytes.  Japanese will be 3 for each character I believe:

    So you will need 3 bytes for every Japanese character... but you can always use the "CHAR" definition instead of the default "BYTES"

    Author Closing Comment

    AL32UTF8 (oracle charset) and UTF-8 (linux file charset) are compatible. Linux (secure shell) falls short of displaying Japanese characters properly. Toad 10.6 helped as it displays Japanese characters properly instead of question marks. Linux file loaded to target database eventually returns valid data.

    Thanks for your inputs to help me stay put in right direction.

    Featured Post

    Highfive Gives IT Their Time Back

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    SAP and ODBC links 2 68
    Stay Alert! 13 47
    error I/O in copying in linux redhat 12 53
    querying by the sum of a column in decimal 7 29
    Working with Network Access Control Lists in Oracle 11g (part 2) Part 1: Previously, I introduced the basics of network ACL's including how to create, delete and modify entries to allow and deny access.  For many…
    Background In several of the companies I have worked for, I noticed that corporate reporting is off loaded from the production database and done mainly on a clone database which needs to be kept up to date daily by various means, be it a logical…
    This video explains at a high level about the four available data types in Oracle and how dates can be manipulated by the user to get data into and out of the database.
    This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now