[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 934
  • Last Modified:

Why does Unicode data corrupt when inserted using the Sybase open client with MicroFocus COBOL?

When I insert Unicode data using the Sybase BCP utility there is no problem. When I insert Unicode data from a COBOL program using embedded SQL it becomes corrupted. Same result in Windows and Solaris (UNIX). I have set the "LANG" environment variable to "us_english.utf8" and issued "set char_convert off", to no avail.

The COBOL isn't a problem; I can manipulate the data, I just can't store it properly in Sybase "univarchar" columns. Using bcp with "-Y -Jutf8" works fine.
0
apra-amcos
Asked:
apra-amcos
  • 4
  • 3
1 Solution
 
Gary PattersonVP Technology / Senior Consultant Commented:
Can you please show us the code that is failing?
0
 
apra-amcosAuthor Commented:
The code doesn't actually fail. the data is stored as rubbish.

I've attached the program source, and the column is question (col6) is a univarchar type. This program is as close as I can get to copying what I imagine that BCP does.
0
 
Gary PattersonVP Technology / Senior Consultant Commented:
"Unexpected output" is certainly in the "fail" category when I'm testing software.

There is no attachment.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
apra-amcosAuthor Commented:
Sorry about the attachment. It should now be OK.
ucload2.txt
0
 
Gary PattersonVP Technology / Senior Consultant Commented:
Disclaimer:  I've used Sybase ASE, but I'm not an expert.  I'm not a MFC expert.  I am primarily a DB2 DBA/programmer, and I've dealt with similar issues there.   I'm giving this a shot because nobody else is responding, so take it for what it is worth.

Also, Unicode support in Sybase is version-dependent.  Please provide your version in future posts.

BCP specifically supports client-side Unicode conversion.  It is a specific feature of the BCP utility.  

In your command line, you're explicitly telling it to turn off host-side conversion (-Y), and explicitly telling it to perform client-side conversion from utf-8 (-Jutf-8 - which means to convert UTF-8 to UTF-16 in the case of UNIVARCHAR columns).  

That is working as documented:

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc30191.1573/html/utilityguide/X14951.htm

I think that you are assuming that setting the "LANG" environment variable to "us_english.utf8" and issuing the "set char_convert off" is the Open Client equivalent of "-Y -Jutf-8" in BCP.  

I think that assumption may be incorrect.  Here's why:

From P2 of http://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc32840.1550/pdf/ctref.pdf

The new Unicode data types UNICHAR and UNIVARCHAR operate in parallel and are completely independent of the traditional character set model. There is no negotiation of character set conversion between servers and clients for Unicode data types.  Servers and clients exchange Unicode data using different tokens from those used for character data,  and thus an independent data path exists, bypassing all the character conversion mechanisms. Clients send and receive Unicode data independently of whatever other character data they send and receive.

So:

1) UNIVARCHAR columns are stored in UTF-16 format.

2) Sybase doesn't do automatic casting from the client code page to UTF-16 for UNIVARCHAR columns like it does for other character data types - it expects you to provide data in the correct format or explicitly cast it.

So if your input file is in UTF-8, I think the Sybase client is just slamming it into UTF-16 columns unconverted.

I suggest one of 2 alternatives:

1) Encode or convert your input file to UTF-16.
2) Use convert() or cast() in your INSERT to explicitly convert each column to univarchar (UTF-16)
0
 
apra-amcosAuthor Commented:
Thanks, Gary, your insights steered us in the right direction. After setting the server default character set to UTF8 (it was ISO_1) and specifying "set char_convert utf8" in the client-side application, it all worked exactly as per the BCP. There was no need to cast or convert the text into univarchar, since this is now the server default.

Thanks again, much appreciated!
0
 
Gary PattersonVP Technology / Senior Consultant Commented:
Glad you got it worked out.  

Based on the documentation, I wouldn't have necessarily expected that to work.  

Based on your results, apparently it can do UTF-8 to UTF-16 conversion automatically - it just can't do automatic conversion to UTF-16 when the client is using a non-Unicode encoding.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now