Why does Unicode data corrupt when inserted using the Sybase open client with MicroFocus COBOL?

Posted on 2014-08-21
Last Modified: 2014-08-27
When I insert Unicode data using the Sybase BCP utility there is no problem. When I insert Unicode data from a COBOL program using embedded SQL it becomes corrupted. Same result in Windows and Solaris (UNIX). I have set the "LANG" environment variable to "us_english.utf8" and issued "set char_convert off", to no avail.

The COBOL isn't a problem; I can manipulate the data, I just can't store it properly in Sybase "univarchar" columns. Using bcp with "-Y -Jutf8" works fine.
Question by:apra-amcos
    LVL 34

    Expert Comment

    by:Gary Patterson
    Can you please show us the code that is failing?

    Author Comment

    The code doesn't actually fail. the data is stored as rubbish.

    I've attached the program source, and the column is question (col6) is a univarchar type. This program is as close as I can get to copying what I imagine that BCP does.
    LVL 34

    Expert Comment

    by:Gary Patterson
    "Unexpected output" is certainly in the "fail" category when I'm testing software.

    There is no attachment.

    Author Comment

    Sorry about the attachment. It should now be OK.
    LVL 34

    Accepted Solution

    Disclaimer:  I've used Sybase ASE, but I'm not an expert.  I'm not a MFC expert.  I am primarily a DB2 DBA/programmer, and I've dealt with similar issues there.   I'm giving this a shot because nobody else is responding, so take it for what it is worth.

    Also, Unicode support in Sybase is version-dependent.  Please provide your version in future posts.

    BCP specifically supports client-side Unicode conversion.  It is a specific feature of the BCP utility.  

    In your command line, you're explicitly telling it to turn off host-side conversion (-Y), and explicitly telling it to perform client-side conversion from utf-8 (-Jutf-8 - which means to convert UTF-8 to UTF-16 in the case of UNIVARCHAR columns).  

    That is working as documented:

    I think that you are assuming that setting the "LANG" environment variable to "us_english.utf8" and issuing the "set char_convert off" is the Open Client equivalent of "-Y -Jutf-8" in BCP.  

    I think that assumption may be incorrect.  Here's why:

    From P2 of

    The new Unicode data types UNICHAR and UNIVARCHAR operate in parallel and are completely independent of the traditional character set model. There is no negotiation of character set conversion between servers and clients for Unicode data types.  Servers and clients exchange Unicode data using different tokens from those used for character data,  and thus an independent data path exists, bypassing all the character conversion mechanisms. Clients send and receive Unicode data independently of whatever other character data they send and receive.


    1) UNIVARCHAR columns are stored in UTF-16 format.

    2) Sybase doesn't do automatic casting from the client code page to UTF-16 for UNIVARCHAR columns like it does for other character data types - it expects you to provide data in the correct format or explicitly cast it.

    So if your input file is in UTF-8, I think the Sybase client is just slamming it into UTF-16 columns unconverted.

    I suggest one of 2 alternatives:

    1) Encode or convert your input file to UTF-16.
    2) Use convert() or cast() in your INSERT to explicitly convert each column to univarchar (UTF-16)

    Author Comment

    Thanks, Gary, your insights steered us in the right direction. After setting the server default character set to UTF8 (it was ISO_1) and specifying "set char_convert utf8" in the client-side application, it all worked exactly as per the BCP. There was no need to cast or convert the text into univarchar, since this is now the server default.

    Thanks again, much appreciated!
    LVL 34

    Expert Comment

    by:Gary Patterson
    Glad you got it worked out.  

    Based on the documentation, I wouldn't have necessarily expected that to work.  

    Based on your results, apparently it can do UTF-8 to UTF-16 conversion automatically - it just can't do automatic conversion to UTF-16 when the client is using a non-Unicode encoding.

    Featured Post

    What Should I Do With This Threat Intelligence?

    Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    Sysbase Central connection fails 3 27
    method argument as final 1 54
    bigHeights  challenge 13 43
    wordlen challenge 3 29
    How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
    When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
    Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
    The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

    729 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now