Link to home
Start Free TrialLog in
Avatar of vishyn
vishynFlag for United States of America

asked on

Character set encoding issues, Inserting special characters from Word into the Sybase db via hibernate

I am uploading a XML file which has some special quotes  in one of the element like (quotes from MS Word, some other characters cut and paste from Word ---- like this text ABCs purchase of 11, As a result of this payment, the facilitys ownership. )
Please look at the apostrophe after ABC and facility.

create table .ABC"(
        "Id" numeric(9) identity not null,
       "OrgId" numeric(9) null,
       "Notes" text null) Of course it has FK constraints etc.
Error in text is below also attached is the screenshot.
select @@identity]; SQL state [ZZZZZ]; error code [2402]; Error converting characters into server's character set. Some character(s) could not be converted. ; nested exception is com.sybase.jdbc3.jdbc.SybSQLException: Error converting characters into server's character set. Some character(s) could not be converted. at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate

I even tried forcing the hibernate to use the character set conversion via the hibernate config file (by specifying UTF-8 and ISO8859_1) but it did not work. I am using jconn3.jar from sybase
 jConnect (TM) for JDBC(TM)/6.05(Build 25828)/P/EBF13044/JDK14

<property name="dialect">org.hibernate.dialect.SybaseDialect</property>
      <property name="hibernate.show_sql">true</property>
      <property name="hibernate.format_sql">true</property>
      <property name="current_session_context_class">jta</property>
      <property name="cache.provider_class">org.hibernate.cache.NoCacheProvider</property>
      <property name="hibernate.query.substitutions">true 1, false 0, yes Y, no N</property>
      <property name="hibernate.bytecode.provider">cglib</property>
      <property name="hibernate.generate_statistics">false</property>
      <property name="hibernate.jdbc.batch_size">10</property>
        <!-- Fix for Special Character errors in Sybase-->
        <property name="hibernate.connection.useUnicode">true</property>
      <property name="hibernate.connection.characterEncoding">UTF-8</property>
      <property name="hibernate.connection.charSet">UTF-8</property>

Sybase online manual says that in Chapter 10,  When Sybbase Aaptive Server encounters a conversion error while sending data to the client, it replaces the bytes of the suspect characters with ASCII question marks (?).
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.sag1/html/sag1/sag1476.htm

 I want to avoid make a change on sybase server  side to make it accept unicode characters, as it might effect other applications. Is there a way we can control this through hibernate or only through sybase. Can we control this behaviour via jconn3 sybase JDBC jar.

I am at loss of ideas here to make this insert successfully in to the database.  Please let me know of the solution
hibernate-error.JPG
MS-Word-apostrophe-in-the-text.txt
Avatar of Joe Woodhouse
Joe Woodhouse

Well, what character sets are you using, and what do you want Sybase ASE to be able to support? It's pretty easy adding support for more character sets.

The main thing is to pick what ASE's default character set is, because you have conversion issues anything you use something other than that.

What platform are you running ASE on? You use an external tool to add character sets. On a UNIX/Linux platform it's $SYBASE/$SYBASE_ASE/bin/sqlloc. (Doing that from memory, may not be exactly correct, but the tool is definitely "sqlloc".) That's a GUI tool and needs an X environment, if you don't have one you'll have to use a resource file and "sqllocres" (reads inputs from the resource file; you have a sample one in $SYBASE/$SYBASE_ASE/init/sample_resource_files).
Avatar of vishyn

ASKER

Hi Joe, Please see my server configuration below. Based on the information below. I want to be able to support frequently used quotes and double quotes etc that come from MS Word documents. I am saving the information that comes from these MS word documents into XML which is then saved to the backend. Please let me know of your thoughts based on the server information below and what I am trying to do.

Parameter Name                 Default     Memory Used Config Value Run Value   Unit                 Type      
--------------                 -------     ----------- ------------ ---------   ----                 ----      
default character set id                 4           0           3            3 id                   static    
default language id                      0           0           0            0 id                   dynamic    
default sortorder id                    50           0          50           50 id                   static    
disable character set conversi           0           0           0            0 switch               static    
enable unicode conversions               0           0           0            0 switch               dynamic    

Parameter Name                 Default     Memory Used Config Value Run Value   Unit                 Type      
--------------                 -------     ----------- ------------ ---------   ----                 ----      
default unicode sortorder           binary           0      binary       binary name                 static    
enable surrogate processing              1           0           1            1 switch               dynamic    
enable unicode conversions               0           0           0            0 switch               dynamic    
enable unicode normalization             1           0           1            1 switch               dynamic    
Avatar of vishyn

ASKER

sqlloc and sqllocrecs might not be helpful in my case as this is a J2EE application which is being inserted via web logic server through hibernate as persistence layer.

Using one version of  jconn3.jar Connect (TM) for JDBC(TM)/6.0(Build 25570) /P/EBF12436/JDK14/ is working fine and inserting ? as per the sybase manual. When Adaptive Server encounters a conversion error while sending data to the client, it replaces the bytes of the suspect characters with ASCII question marks (?).

In test and production, the jconn3.jar is upgraded to  jConnect (TM) for JDBC(TM)/6.05(Build 25828)/P/EBF13044/JDK14/. As you see there is a difference in the jar files and this jar is causing the problems.

Hope this helps in understanding the problem
sqlloc and sqllocres are server-side tools used to setup character sets within ASE - they're not relevant to your client or middle tier, so don't worry about J2EE and Web Logic.

I don't recall the character set codes off the top of my head, can you also run "sp_helpsort" in your ASE and post the results here?

Basically I'm just wanting to check which character set you have as your server default.

I am surprised that the .jar file is making a difference to you, but that suggests there's something wrong at the jConnect level rather than the ASE. Do you have a more recent jConnect EBF you can test, maybe that reverts to the older behaviour again? Failing that, maybe it's time for a Sybase Tech Support case against the jConnect...
ASKER CERTIFIED SOLUTION
Avatar of vishyn
vishyn
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Aha, so it was a jConnect change.

And yes, your solution sounds much cleaner than messing around with switching everything to unicode, UTF8, forcing unicode normalisations, etc!
To answer this :-

Use Sybase Server wide Setting or Session Level Setting :-
                        “disable character set conversi” == True”
                        Or set char_convert off
            By using the parameter we have able to convert the special ASCII characters back to the correct values.

            In the database , data is still stored as a garbled value, but by turning this conversion off, we are able view the correct value of this garbled data in the application