Why does Unicode data corrupt when inserted using the Sybase open client with MicroFocus COBOL?

When I insert Unicode data using the Sybase BCP utility there is no problem. When I insert Unicode data from a COBOL program using embedded SQL it becomes corrupted. Same result in Windows and Solaris (UNIX). I have set the "LANG" environment variable to "us_english.utf8" and issued "set char_convert off", to no avail.

The COBOL isn't a problem; I can manipulate the data, I just can't store it properly in Sybase "univarchar" columns. Using bcp with "-Y -Jutf8" works fine.
apra-amcosAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gary PattersonVP Technology / Senior Consultant Commented:
Can you please show us the code that is failing?
0
apra-amcosAuthor Commented:
The code doesn't actually fail. the data is stored as rubbish.

I've attached the program source, and the column is question (col6) is a univarchar type. This program is as close as I can get to copying what I imagine that BCP does.
0
Gary PattersonVP Technology / Senior Consultant Commented:
"Unexpected output" is certainly in the "fail" category when I'm testing software.

There is no attachment.
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

apra-amcosAuthor Commented:
Sorry about the attachment. It should now be OK.
ucload2.txt
0
Gary PattersonVP Technology / Senior Consultant Commented:
Disclaimer:  I've used Sybase ASE, but I'm not an expert.  I'm not a MFC expert.  I am primarily a DB2 DBA/programmer, and I've dealt with similar issues there.   I'm giving this a shot because nobody else is responding, so take it for what it is worth.

Also, Unicode support in Sybase is version-dependent.  Please provide your version in future posts.

BCP specifically supports client-side Unicode conversion.  It is a specific feature of the BCP utility.  

In your command line, you're explicitly telling it to turn off host-side conversion (-Y), and explicitly telling it to perform client-side conversion from utf-8 (-Jutf-8 - which means to convert UTF-8 to UTF-16 in the case of UNIVARCHAR columns).  

That is working as documented:

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc30191.1573/html/utilityguide/X14951.htm

I think that you are assuming that setting the "LANG" environment variable to "us_english.utf8" and issuing the "set char_convert off" is the Open Client equivalent of "-Y -Jutf-8" in BCP.  

I think that assumption may be incorrect.  Here's why:

From P2 of http://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc32840.1550/pdf/ctref.pdf

The new Unicode data types UNICHAR and UNIVARCHAR operate in parallel and are completely independent of the traditional character set model. There is no negotiation of character set conversion between servers and clients for Unicode data types.  Servers and clients exchange Unicode data using different tokens from those used for character data,  and thus an independent data path exists, bypassing all the character conversion mechanisms. Clients send and receive Unicode data independently of whatever other character data they send and receive.

So:

1) UNIVARCHAR columns are stored in UTF-16 format.

2) Sybase doesn't do automatic casting from the client code page to UTF-16 for UNIVARCHAR columns like it does for other character data types - it expects you to provide data in the correct format or explicitly cast it.

So if your input file is in UTF-8, I think the Sybase client is just slamming it into UTF-16 columns unconverted.

I suggest one of 2 alternatives:

1) Encode or convert your input file to UTF-16.
2) Use convert() or cast() in your INSERT to explicitly convert each column to univarchar (UTF-16)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
apra-amcosAuthor Commented:
Thanks, Gary, your insights steered us in the right direction. After setting the server default character set to UTF8 (it was ISO_1) and specifying "set char_convert utf8" in the client-side application, it all worked exactly as per the BCP. There was no need to cast or convert the text into univarchar, since this is now the server default.

Thanks again, much appreciated!
0
Gary PattersonVP Technology / Senior Consultant Commented:
Glad you got it worked out.  

Based on the documentation, I wouldn't have necessarily expected that to work.  

Based on your results, apparently it can do UTF-8 to UTF-16 conversion automatically - it just can't do automatic conversion to UTF-16 when the client is using a non-Unicode encoding.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming Languages-Other

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.