Character set encoding issues, Inserting special characters from Word into the Sybase db via hibernate

Posted on 2010-01-08
Last Modified: 2014-09-10
I am uploading a XML file which has some special quotes  in one of the element like (quotes from MS Word, some other characters cut and paste from Word ---- like this text ABCs purchase of 11, As a result of this payment, the facilitys ownership. )
Please look at the apostrophe after ABC and facility.

create table .ABC"(
        "Id" numeric(9) identity not null,
       "OrgId" numeric(9) null,
       "Notes" text null) Of course it has FK constraints etc.
Error in text is below also attached is the screenshot.
select @@identity]; SQL state [ZZZZZ]; error code [2402]; Error converting characters into server's character set. Some character(s) could not be converted. ; nested exception is com.sybase.jdbc3.jdbc.SybSQLException: Error converting characters into server's character set. Some character(s) could not be converted. at

I even tried forcing the hibernate to use the character set conversion via the hibernate config file (by specifying UTF-8 and ISO8859_1) but it did not work. I am using jconn3.jar from sybase
 jConnect (TM) for JDBC(TM)/6.05(Build 25828)/P/EBF13044/JDK14

<property name="dialect">org.hibernate.dialect.SybaseDialect</property>
      <property name="hibernate.show_sql">true</property>
      <property name="hibernate.format_sql">true</property>
      <property name="current_session_context_class">jta</property>
      <property name="cache.provider_class">org.hibernate.cache.NoCacheProvider</property>
      <property name="hibernate.query.substitutions">true 1, false 0, yes Y, no N</property>
      <property name="hibernate.bytecode.provider">cglib</property>
      <property name="hibernate.generate_statistics">false</property>
      <property name="hibernate.jdbc.batch_size">10</property>
        <!-- Fix for Special Character errors in Sybase-->
        <property name="hibernate.connection.useUnicode">true</property>
      <property name="hibernate.connection.characterEncoding">UTF-8</property>
      <property name="hibernate.connection.charSet">UTF-8</property>

Sybase online manual says that in Chapter 10,  When Sybbase Aaptive Server encounters a conversion error while sending data to the client, it replaces the bytes of the suspect characters with ASCII question marks (?).

 I want to avoid make a change on sybase server  side to make it accept unicode characters, as it might effect other applications. Is there a way we can control this through hibernate or only through sybase. Can we control this behaviour via jconn3 sybase JDBC jar.

I am at loss of ideas here to make this insert successfully in to the database.  Please let me know of the solution
Question by:vishyn
    LVL 24

    Expert Comment

    by:Joe Woodhouse
    Well, what character sets are you using, and what do you want Sybase ASE to be able to support? It's pretty easy adding support for more character sets.

    The main thing is to pick what ASE's default character set is, because you have conversion issues anything you use something other than that.

    What platform are you running ASE on? You use an external tool to add character sets. On a UNIX/Linux platform it's $SYBASE/$SYBASE_ASE/bin/sqlloc. (Doing that from memory, may not be exactly correct, but the tool is definitely "sqlloc".) That's a GUI tool and needs an X environment, if you don't have one you'll have to use a resource file and "sqllocres" (reads inputs from the resource file; you have a sample one in $SYBASE/$SYBASE_ASE/init/sample_resource_files).

    Author Comment

    Hi Joe, Please see my server configuration below. Based on the information below. I want to be able to support frequently used quotes and double quotes etc that come from MS Word documents. I am saving the information that comes from these MS word documents into XML which is then saved to the backend. Please let me know of your thoughts based on the server information below and what I am trying to do.

    Parameter Name                 Default     Memory Used Config Value Run Value   Unit                 Type      
    --------------                 -------     ----------- ------------ ---------   ----                 ----      
    default character set id                 4           0           3            3 id                   static    
    default language id                      0           0           0            0 id                   dynamic    
    default sortorder id                    50           0          50           50 id                   static    
    disable character set conversi           0           0           0            0 switch               static    
    enable unicode conversions               0           0           0            0 switch               dynamic    

    Parameter Name                 Default     Memory Used Config Value Run Value   Unit                 Type      
    --------------                 -------     ----------- ------------ ---------   ----                 ----      
    default unicode sortorder           binary           0      binary       binary name                 static    
    enable surrogate processing              1           0           1            1 switch               dynamic    
    enable unicode conversions               0           0           0            0 switch               dynamic    
    enable unicode normalization             1           0           1            1 switch               dynamic    

    Author Comment

    sqlloc and sqllocrecs might not be helpful in my case as this is a J2EE application which is being inserted via web logic server through hibernate as persistence layer.

    Using one version of  jconn3.jar Connect (TM) for JDBC(TM)/6.0(Build 25570) /P/EBF12436/JDK14/ is working fine and inserting ? as per the sybase manual. When Adaptive Server encounters a conversion error while sending data to the client, it replaces the bytes of the suspect characters with ASCII question marks (?).

    In test and production, the jconn3.jar is upgraded to  jConnect (TM) for JDBC(TM)/6.05(Build 25828)/P/EBF13044/JDK14/. As you see there is a difference in the jar files and this jar is causing the problems.

    Hope this helps in understanding the problem
    LVL 24

    Expert Comment

    by:Joe Woodhouse
    sqlloc and sqllocres are server-side tools used to setup character sets within ASE - they're not relevant to your client or middle tier, so don't worry about J2EE and Web Logic.

    I don't recall the character set codes off the top of my head, can you also run "sp_helpsort" in your ASE and post the results here?

    Basically I'm just wanting to check which character set you have as your server default.

    I am surprised that the .jar file is making a difference to you, but that suggests there's something wrong at the jConnect level rather than the ASE. Do you have a more recent jConnect EBF you can test, maybe that reverts to the older behaviour again? Failing that, maybe it's time for a Sybase Tech Support case against the jConnect...

    Accepted Solution

    Actually after spedning a considerable amount of time tweaking sybase server properties, It is not possible according to sybase documentation itself. As there are so many other databases on that server, I finally wrote a piece of java code to replace smart quotes with regular quotes. Here is the info below r'ing the jconnect jar. Thanks for your time.

     * For jConnect versions starting with 6.05, if jConnect cannot successfully convert the user data to the negotiated charset, it will send unconverted unicode characters to the server, if the server supports the unicode characters (ASE 12.5 or higher), else it will throw an exception.
    LVL 24

    Expert Comment

    by:Joe Woodhouse
    Aha, so it was a jConnect change.

    And yes, your solution sounds much cleaner than messing around with switching everything to unicode, UTF8, forcing unicode normalisations, etc!

    Expert Comment

    by:vishal chadha
    To answer this :-

    Use Sybase Server wide Setting or Session Level Setting :-
                            “disable character set conversi” == True”
                            Or set char_convert off
                By using the parameter we have able to convert the special ASCII characters back to the correct values.

                In the database , data is still stored as a garbled value, but by turning this conversion off, we are able view the correct value of this garbled data in the application

    Featured Post

    What Should I Do With This Threat Intelligence?

    Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

    Join & Write a Comment

    Suggested Solutions

    Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
    International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
    The viewer will learn how to implement Singleton Design Pattern in Java.
    This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

    729 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now