Solved

unicode_columnTypes

Posted on 2010-11-16
15
340 Views
Last Modified: 2012-05-10
I have a 9i database in WE8ISO8859P1
A new requirement came in to store some data like book titles in Russian, arabic, and other foreign languages.
What would be the best thing to do?
a) change the whole database to UTF-8. that would to create a new DB and migrate data and test everything - lot of work i expect
b) create new columns with NVARCHAR2(4000) or NCLOB type. This would require many code changes in the client code.
c) Change exisitng columns that need to store foreign data from VARCHAR2 to NVARCHAR2.

Would c) work. Is not this the best and easiest solution. would there be any consequences in data stored in DB or pl/sql web client (SQL) or power builder client accessing the data.
0
Comment
Question by:sam15
  • 8
  • 7
15 Comments
 
LVL 7

Expert Comment

by:jocave
Comment Utility
Personally, I would tend to vote for option a.  Yes, it going to require a full regression test of the application but you do it once and then you're good to go.

Option b can work.  You'll need to move the data, which is going to be similar to the work required for option a if you're trying to convert any fraction of the columns.  You can't just do an ALTER that changes the data type of a column from VARCHAR2 to NVARCHAR2.  You would need to create a new NVARCHAR2 column, copy the data, and then remove the old VARCHAR2 column and rename the new column to the old column name.  Plus, NVARCHAR2 columns introduce a reasonable amount of additional complexity in the client code.  I don't know off the top of my head whether PowerBuilder even supports NVARCHAR2 columns (it may depend on the version).
0
 

Author Comment

by:sam15
Comment Utility
Option b was creating new columns so basically I would a new NVARCHAR2 column to store foreign titles while the regular VARCHAR2 column stores the english version. That  makes things complex when coding.

I think you meant option c. If oracle has no ALTER column type i can still create new NVARCHAR2 column and copy the data. would that cause any issues for the client when you have SQL statement selecting several columns some are VARCHAR2 and some are NVARCHAR2.

The client can be browser or power builder .PB clients i think run in unicode.
0
 
LVL 7

Expert Comment

by:jocave
Comment Utility
So just to be clear, option B would be to have both a VARCHAR2 and a NVARCHAR2 column in the table storing the same data with the client application figuring out which column to use depending on the data element?  That would seem like an awful lot of work to try to keep that straight.

It is certainly possible to build an application along these lines.  But all your applications would need to be able to handle NVARCHAR2 columns.  That depends on the client language (and I'm no PowerBuilder guru).  I know that it's quite possible to handle NVARCHAR2 columns in Java, for example, but it is something that trips up a lot of developers.  And third party tools will often have difficulty handling NVARCHAR2 data.
0
 

Author Comment

by:sam15
Comment Utility
what would be the difference if the whole DB is set at UTF-8 and the column at NVARCHAR2 where national character set at NVARCHAR2.

it is similar thing. the data will be stored and rendred in multibytes to the client.
0
 
LVL 7

Expert Comment

by:jocave
Comment Utility
If the database character set is UTF-8 (presumably AL32UTF8), all the data would still be stored in a VARCHAR2 column.  So client applications don't need to know how to handle NVARCHAR2 columns, you don't have to deal with string constants that may not be representable in the database character set, etc.  
0
 

Author Comment

by:sam15
Comment Utility
yes it will but it is myltibyte storage instead of single byte storage.

I am not sure what you mean by how clients to handle data.

The client usually converts the data to whatever the character set of the client i thought.

Are you saying the client will get confused when you send single byte field data and multibyte field data?
0
 
LVL 7

Expert Comment

by:jocave
Comment Utility
I'm saying that the client applications have to be aware that certain columns are NVARCHAR2 and certain columns are VARCHAR2 independent of whether the VARCHAR2 columns happen to contain single or multi-byte characters.  They have to bind the data a bit differently, they have to handle string constants a bit differently, etc.  

A Java application, for example (barring some real corner cases), will always request Unicode data from the database regardless of the database character set.  So Java applications inherently handle multi-byte data.  But handling NVARCHAR2 columns creates a number of problems for a Java developer.  The calls that the Java app needs to make to tell Oracle to treat one bind variable as an NVARCHAR2 and another as a VARCHAR2 can be a bit verbose and many libraries that a Java developer may want to use won't be aware of how to do that.  So while it is perfectly possible to write a Java application that correctly handles databases where some columns are VARCHAR2 and others are NVARCHAR2, you're introducing a number of extra hurdles and potentially eliminating some libraries.

The same goes with third party tools.  Even tools that happily support multi-byte data may not have been coded to know how to handle NVARCHAR2 columns correctly.  Some will, of course, but you're setting yourself up down the line to not be able to use a tool that you would like to use because it supports Unicode data but not Oracle's national character set implementation (tools that are designed to support many different databases are often the slowest to adopt this sort of Oracle-specific functionality).
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:sam15
Comment Utility
Oh, i think i see what you say.

in pl/sql for examlpe, i have to change the variable type from VARCHAR2(n) to NVARCHAR2(N) unless i use dynamic data types like

my_Var  table.column%type.

I am wondering if i have sql statement like this, would it fetch data in multibyte or singlebyte.

select varchar2_column,nvarchar2_column from table;
0
 
LVL 7

Expert Comment

by:jocave
Comment Utility
The character set that is returned to the client depends on the client's NLS settings (which introduces another set of wrinkles since those settings don't work exactly the same way for NVARCHAR2 and VARCHAR2 columns).  Either the VARCHAR2 or the NVARCHAR2 column could be returned as a single-byte or as a multi-byte string.

There is relatively little that needs to be done for PL/SQL to support NVARCHAR2 and VARCHAR2 columns because data never gets converted from one character set to another and Oracle internally handles all the issues with binding the right data type for a bind variable.  If you want to hard code values that cannot be represented in the database character set (say, you want your code to return an NVARCHAR2 error message), that introduces some complexity.

If you're writing a Java application, however, every time you bind a variable that is not the default (which you can set to VARCHAR2 or NVARCHAR2), you have to issue an additional call to setFormOfUse to tell the JDBC driver to treat the column as whichever option is not the default.  So every time you pass an NVARCHAR2 bind variable from the client application, the client code has to make an extra call to tell Oracle that the column is NVARCHAR2 that doesn't have to be made if everything is stored in a VARCHAR2 column.  If your library/ framework/ etc. doesn't know how to do that, it won't work with NVARCHAR2 columns.  And if you happen to have applications where developers are generating SQL statements by concatenating strings together rather than using bind variables, things get even more complicated because SQL statements have to be representable in the database character set.
0
 

Author Comment

by:sam15
Comment Utility
I misunderstood their reqt.

I found out they need to keep the english book titles, narators, authors but want to add a second set of foreign data for these titels so foreign users can read that and search by it.

it sounds the easiest thing is adding a FOREIGN_DATA_TABLE and adding set of columns like

TITLE  NVARCHAR2(4000) or NCLOB
NARRATOR_FNAME NVARCHAR(100)
NARRATOR_LNAME NVARCHAR2(100)

The other english data stays in WEISO88591P1.
I can use oracle CONVERT function here if i want to convert it to UTF8 before it goes to client.

What do you think of this solution?
0
 
LVL 7

Expert Comment

by:jocave
Comment Utility
You wouldn't want to use the CONVERT function.  The Oracle networking layer will automatically convert data to the character set the client requests (a Java application, for example, would want UTF-16 data rather than UTF-8).  But your application code would need to know how to bind NCHAR/ NVARCHAR2 parameters and fetch NCHAR/ NVARCHAR2 data which is going to require some client configuration changes.  For PowerBuilder, the code changes to support NCHAR/ NVARCHAR2 columns are documented on the Sybase site.  Other libraries/ languages/ clients will have similar requirements (assuming they support NCHAR/ NVARCHAR2).

0
 

Author Comment

by:sam15
Comment Utility
so your saying creating a table similar to above using national character set and wiritng the client (regardless of language) to handle it will enable us to store, maniuplate and retrieve foreign data.

there is no need to conver the whole database to Unicode format.

correct?
0
 
LVL 7

Expert Comment

by:jocave
Comment Utility
There is no need to convert the entire database to Unicode.

In my experience, though, you'll save yourself time and frustration over the long run by converting the entire database to Unicode.  It is all but inevitable that over time the amount of non-English data is going to grow and the number of columns that will store non-English data will grow.  You'll find that some library or reporting tool that you want to use either doesn't support columns using the national character set or that configuring it to do so is highly esoteric.  Some developers will struggle with character set issues because they didn't bind to an NVARCHAR2 column correctly, etc.  None of these may be fatal.  But over the long run, my bias would be that converting the entire database to Unicode, while more work immediately, will pay dividends over the life of your data.
0
 

Author Comment

by:sam15
Comment Utility
so you agree with creating one unicode table and handling it with client?

Converting the whole database i think would need a lot of testing. I am thinking a 2 month project for 300 table database with 600,000 lines of code. what do you think?
0
 
LVL 7

Accepted Solution

by:
jocave earned 500 total points
Comment Utility
No matter what, the client is going to potentially need to be modified either to handle Unicode data in general or to handle Unicode data for specific columns.

I would not object to adding some NVARCHAR2 columns to the existing BOOK table and modifying just the applications and screens that need to fetch the non-English data from those columns if that's the decision.  That's a perfectly valid decision.

My preference, though, would still be to just convert the entire database.  In my experience, once an application gets one requirement to handle data in additional languages, it is all but inevitable that more such requests will be forthcoming.  Users are going to want to see more data in foreign languages, they're going to want application screens to display text in other languages, more screens are going to need to be modified to display the foreign language data, etc.  Frequently, it's just as easy to modify the application to support Unicode (particularly something like PowerBuilder which has supported Unicode since, it appears, version 10) rather than to identify individual screens that need to change to support NCHAR/ NVARCHAR2 columns.  And once you modify the application, you don't have to think about it again while if you go down the NCHAR/ NVARCHAR2 route, you're regularly going to have situations where you need to modify another screen because someone wants to add the foreign language title to another screen.

As for the time line, it really depends on how long it takes to regression test your application.  If you have an automated regression test suite, you may find that you can just export the data from the existing database, create a new Unicode database, do the import, potentially change a couple of configuration parameters, and your existing PowerBuilder application may just work.  On the other hand, two months may be a low estimate if it takes weeks to perform a full regression test by hand.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now