Solved

JdbcOdbc Unicode-Database encoding

Posted on 1998-11-25
38
768 Views
Last Modified: 2008-02-20
I'm using JDK1.1.6 on Win95 (Hebrew).
In my access database there are Hebrew chars (ascii 225-250). The OdbcJdbc Bridge transform these chars to '?'.
I assume that the reason is that the Driverer does not kow that I'm using a code page that supports these values.
How can I tell the Driver this information ?

If I use the Jbuilder VM the same driver retreive the data correctly.
0
Comment
Question by:offerm
  • 15
  • 13
  • 5
  • +2
38 Comments
 
LVL 1

Expert Comment

by:malexiev
ID: 1228230
I think it's not a VM - problem. Why don't you try to use file "font.properties" used in JBuilder (it's placed in JBuilder/java/lib directory). Just copy it in your jdk/lib directory.
I think this works.
0
 

Author Comment

by:offerm
ID: 1228231
1) I can't find font.properties in Jbuilder environemnt
2) It is only working with Jbuilder's java command and not with Jbuiler's Jre command
3) I don't think it is related to font.properties.
4) According the the JDBC DOC - The Jdbc driver should translate from the database charset to UniCode.
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228232
how do you extract the sting (chars) data from the database

a piece of code would be helpfull

0
 
LVL 1

Expert Comment

by:malexiev
ID: 1228233
I had the same or similar problem.

In my database there were some cyrilic strings which was displayed that way,
then I adjusted my font.properties to support Russian Charset and cyrillic unicodes.
All was right.

I mean that may be you have to do some changes in font.properties. There's no argue that JDBC driver translate to unicode, the question is how to display it on screen and it depends on that property file.
0
 
LVL 27

Expert Comment

by:BigRat
ID: 1228234
Malexiev is perfectly correct. When the Byte data from JDBC-ODBC bridge arrives you know that it is (perhaps) ANSI but you don't know what variation (not exactly the same as what code page but that's not the point). Consequently a trsnation routine is called and this routie is found from the file font.properties. However there are more that one of these files. font.properties.ja is for Japan, font.properties.ko for Korea and there are also two for Chinese. It may be that a) your file is font.properties.something or b) that unless you have installed the Database version of JBuilder the font.properties file does not contain an entry for JDBC/ODBC. In this case it would be nice it Malexiev would post his file as an answer, because the deserves the points.
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228235
i've personally experiensed problems with "retreive the data correctly", not with visualizating it.
so its better if offerm posts the code that he uses for retreiving data.

  heyhey
0
 

Author Comment

by:offerm
ID: 1228236
Here is the Java code:

import java.sql.*;
import java.util.*;
import java.io.*;



public class Test1 {

  public Test1() {
  }

  public static void main(String[] args) {
   System.out.println("Start");

   Connection con;

  try{
              Driver d = (Driver)Class.forName( "sun.jdbc.odbc.JdbcOdbcDriver").newInstance();

     /*Register the driver.*/
          DriverManager.registerDriver(d);

             con = DriverManager.getConnection("jdbc:odbc:nadlan");
     Statement stmnt = con.createStatement();
     stmnt.executeQuery("select * from tmpAgency");
     ResultSet rs = stmnt.getResultSet();
     String  str = null;

    byte[]  bytes = null;
    rs.next();
    str = rs.getString(2);
    bytes = str.getBytes();

    for(int y = 0; y < str.length();y++)
    {
      System.out.print(str.charAt(y)+" ");
    }
    System.out.println();
    int tmp = 0;
    for( int y = 0; y < bytes.length;y++)
    {
        tmp =  bytes[y] >= 0?bytes[y]:bytes[y]+256;
        System.out.print(tmp+" ");
    }
    System.out.println();

   }
   catch(Exception e)
   {
    System.out.println(e);
   }

   System.out.println("End");

  }
}

This is the output from Jbuilder Java command:
AppAccelerator(tm) 1.1.034 for Java (JDK 1.1), x86 version.
Copyright (c) 1998 Borland International. All Rights Reserved.
Start
a p f   s &#63635; T · s
224 227 237 32 229 225 233 250 229

This is the output from JDK1.1.6 Java command
Start
? ? ?   ? ? ? ? ?
63 63 63 32 63 63 63 63 63
End

Very important:
The Jbuilder's java command return the correct values while the Jbuilder's jre does not !!!

This is not a font.properties problem !!!!
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228237

        Now, jdbc-odbc bridge - the latest one supports charset property

        See this abstract from the webpage http://java.sun.com/products/jdk/1.2/docs/guide/jdbc/bridge.html

        What's new with the JDBC-ODBC Bridge?

            The Bridge has been re-implemented using the Java Native Interface API, for improved performance and stability.

            A jdbc:odbc: connection can now have a charSet property, to specify a Character Encoding Scheme other than the client default. For possible
        values,
            see the JDK 1.1 Internationalizaton specification on the Javasoft Web Site.

            The Bridge now assumes that ODBC drivers are able to handle multi-threaded access. This will improve performance. If you need to use
        multi-threading and
            your ODBC package does not support it, your client program will have to implement locking.





So, while giving the data source name, you have to give it like this
The Bridge driver uses the odbc subprotocol. URLs for this subprotocol are of the form:

    jdbc:odbc:<data-source-name>[<attribute-name>=<attribute-value>]*


For example:
    jdbc:odbc:mydb;UID=me;PWD=secret;Charset=8859_8


0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228238
maybe the problem is in this line
str = rs.getString(2);

instead of
String ResultSet.getString(2)
you can try
InputStream ResultSet.getUnicodeStream(2)

>> A column value can be retrieved as a stream of Unicode characters and
>> then read in chunks from the stream. This method is particularly
>> suitable for retrieving large LONGVARCHAR values.
>> The JDBC driver will do any necessary conversion from the database
>> format into Unicode.

and then you can create String from the InputStream

0
 

Author Comment

by:offerm
ID: 1228239
Evilay gave me a perfect answer. The program is working fine with JDK1.2. I even don't need to specify the codepage to be used.

Is there any way to solve this problem in jdk1.1.6
0
 

Author Comment

by:offerm
ID: 1228240
Evilay gave me a perfect answer. The program is working fine with JDK1.2. I even don't need to specify the codepage to be used.

Is there any way to solve this problem in jdk1.1.6
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228241
offerm:
as far as i remember i solved my problem (reading cyrillyc symbols from database) using

   InputStream ResultSet.getUnicodeStream()
or
   InputStream ResultSet.getAsciiStream()
instead of
   ResultSet.getString()
(i don't remember which one works)
0
 
LVL 27

Expert Comment

by:BigRat
ID: 1228242
I do not like repeating myself but the conversion from ANSI or ASCII to Unicode, which takes place under the JDBC-ODBC hood relies on s set of little known classes belonging to sun.io.* and translate the characters to Unicode. The location of these classes and/or the properties file which points to them is of utmost importance. Hence the reply from malexiev is correct. I'd just wish he'd answer!
0
 
LVL 1

Expert Comment

by:malexiev
ID: 1228243
I thing I have to say something, too.

First of all, thanks a lot to BigRat for his support.
I had a look at the JBuilder's font.properties and there's no problem these unicode chars (I mean unicode 224, 227, . . . ) to be shown.

Offerm, why don't you just copy jdk's font.properties over JBuilder's font.properties and test it again with JBuilder. If the program works, you WIN. I'll agree that that's not FONT.PROPERTIES problem.

Offerm have a great luck with that he wants to retrieve Hebrew chars (JBuilder support them by default, but jdk not). If he wants to do the same with Japanese jars, may be the situation would change dramatically.

Good Luck.
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228244
malexiev:

>> why don't you just copy jdk's font.properties over JBuilder's font.properties

or maybe just copy  JBuilder's font.properties over jdk's font.properties ???

The ways of God are Many ...
:-))

Best regards
  heyhey

(P.S. keep on good coding :-))
0
 

Author Comment

by:offerm
ID: 1228245
This is not a font.properties problem!!!!

Check evijay answer from Nov 29. He has a correct response for JDK 1.2 (There was a change in the Jdbc-Odbc driver)

I need a solution for JDK1.1.6

Thanks
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228246
offerm:

i lost one hour while testing this little exaple so this is my last message
i have a database with some cyrrilic text inside (Access97) and i use jdbc:odbc bridge + odbc driver.
all the three columns contain the word <"êèðèëèöà">

when i execute this code

           byte[] bytes;
               int t = 0;
               InputStream i;
               String st;
               int tmp;
           rs.next();

               st = rs.getString("t1");
               bytes = st.getBytes();
               t = st.length();
               System.out.println("\nloaded " + t + " symbols");
           for( int y = 0; y < t;y++) System.out.print(bytes[y]  +" ");

               i = rs.getUnicodeStream("t2");
               bytes = new byte[100];
               t = i.read(bytes);
               System.out.println("\nloaded " + t + " symbols");
           for( int y = 0; y < t;y++)
               {  tmp =  bytes[y] >= 0?bytes[y]:bytes[y]+256;  System.out.print(tmp +" "); }

               i = rs.getAsciiStream("t3");
               bytes = new byte[100];
               t = i.read(bytes);
               System.out.println("\nloaded " + t + " symbols");
           for( int y = 0; y < t;y++)
               {  tmp =  bytes[y] >= 0?bytes[y]:bytes[y]+256;  System.out.print(tmp +" "); }


i receive this output

--output---

Start
loaded 8 symbols
63 63 63 63 63 63 63 63
loaded 16 symbols
0 234 0 232 0 240 0 232 0 235 0 232 0 246 0 224
loaded 8 symbols
234 232 240 232 235 232 246 224 End

you can see that getString DOES NOT work as expected
but getUnicode stream DOES

this works for me (jdk1.1.7A / jdk1.1.5 / cyrillic )

if this doesn't works for you - check your ODBC driver (maybe ODBC does not supports UNICODE ???)
if it works for you - i can post it as an answer

BigRat:
maybe you are right:
"... which takes place under the JDBC-ODBC hood relies on s set of little known classes belonging to sun.io.* and translate the characters to Unicode. ..."

i am quite intrigued and i'll check this when i have some free time ...

best regards
  heyhey

0
 
LVL 4

Accepted Solution

by:
evijay earned 300 total points
ID: 1228247
Hi All,

Here is the actual thing going on (I answered this question long back http://www.experts-exchange.com/topics/comp/lang/java/Q.10068132 )

The suns jdbc-odbc bridge which offerm is using is having a bug. When any string gets passed from java to C++ or C++ to java, it must be converted from Unicode to Platform character set and viceversa.
The problem is that in the jdbcodbc driver, the conversion is wierd in the sense, each byte received from odbc is converted into a Unicode Character and no charset conversion happens.

i.e., the String constructor used is

 new String(byte[], 0, 0, len); and note that this is a deprecated constructor.


The  string constructor to be used should be

new String(byte[], 0, len)

So, what is happening is this. say a cyrillic char with code point 250 is sent from odbc. This should be transformed into unicode code point say 1200 (this is example not accurate). But what happens is byte appears as it is without conversion. So, you have a unicode char with codepoint of 250 which is a unprintable char and hence you see a ?.

The best and easiest way to solve this is

String str = rs.getString(2);
byte[] b = new byte[str.length()];
// convert string back into bytes
for (int i = 0; i < b.length; i++)
    b[i] = (byte) str.charAt(i);
String finalStr = new String(b); // now java does platform encoding


Regarding getUnicodeStream stuff suggested by heyhey, again the author of jdbc-odbc bridge (Mr Karl Moss - writer of famous Java Servlets book which has a 3 tier jdbc/odbc driver implementation using servlets) makes the same mistake of not doing any charset conversions and putting the stuff as it is.








0
 
LVL 4

Expert Comment

by:evijay
ID: 1228248
Believe me. I went thru the source code of jdbc/odbc bridge !!

0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 4

Expert Comment

by:evijay
ID: 1228249
Here is the answer of the long time back asked question which I gave to that guy.

So, in summary, if you pass a unicode character whose value is > 255 to jdbc , the upper byte will be truncated neatly.
Say if you pass 0xfe78 (suppose say a japanese character), the thing that get passed to the odbc is 0x78 only. The upper byte is simply truncated. So, if you want to pass any string which has such characters, my best bet is this

// suppose mystr is the one which you want to pass
byte[] b = mystr.getBytes();
String finalstr = new String(b, 0);

Now, pass finalstr to jdbc instead of mystr.

Thanx
vijay




0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228250
             
 evijay
your code:

         st = rs.getString("t1");
         bytes = new byte[st.length()];
         for (int y = 0; y < bytes.length; y++)   bytes[y] = (byte) st.charAt(y);

               t = bytes.length;
               System.out.println("\n sting has " + t + " symbols");
           for( int y = 0; y < t;y++) {  tmp =  bytes[y] >= 0?bytes[y]:bytes[y]+256;  System.out.print(tmp +" "); }

gives me exactly the same result as
my code

         i = rs.getAsciiStream("t1");
         bytes = new byte[100];
         t = i.read(bytes);
               System.out.println("\nloaded " + t + " symbols");
           for( int y = 0; y < t;y++)
               {  tmp =  bytes[y] >= 0?bytes[y]:bytes[y]+256;  System.out.print(tmp +" "); }


but ...
the ways of God are Many ...
(but of course, i haven't "gone thru the source code of jdbc/odbc bridge" ... yet :))

Q: does ODBC support real Unicode ?? as far as i remeber Win95 does not supports fully Unicode

0
 
LVL 4

Expert Comment

by:evijay
ID: 1228251
Hi HeyHey,

I think you still didnt get the concept. here it is.

Suppose let us take a  arabic character with a code point of 0xE6 (this is Arabic letter WAW). (i think you understand what a code point is : for instance character 'A' has code point of 0x41 in ASCII or ISO LATIN1 encoding). the arabic encoding is in windows code page 1256. The same arabic character WAW has a unicode code point of 0x0648.

Now, in database, the arabic data will be in code page 1256 encoding (just like english data will be in ASCII encoding (windows code page 1252)).

When data is read from database into java, there need to be a translation of the arabic character WAW from 0xE6 to 0x0648. Thus, every 0xE6 read from database should be translated to 0x0648 when it comes into java since java uses unicode. The jdbc-odbc bridge now currently just takes 0xE6 and doesnt do any conversion and passes it as 0xE6. In unicode, code point 0xE6 is LATIN SMALL LIGATURE AE. This is not the arabic character we want.

Now, similarly, when you pass a unicode character to jdbc-odbc bridge for storing into database, no conversion happens. when you pass from your program a unicode character with code point 0x0648, the jdbc-odbc driver just ignores the most significant byte of two byte unicode character and sends 0x48 to the odbc which is English character 'H'.

Thus, there must be some way to overcome this bug. The way I suggested. If you are not clear of the solution I presented, I can elaborate for you.

Vijay
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228252
Hi HeyHey,

Your comment is

evijay
              your code:

                 st = rs.getString("t1");
                 bytes = new byte[st.length()];
                 for (int y = 0; y < bytes.length; y++)   bytes[y] = (byte) st.charAt(y);

                 t = bytes.length;
                 System.out.println("\n sting has " + t + " symbols");
                         for( int y = 0; y < t;y++) {  tmp =  bytes[y] >= 0?bytes[y]:bytes[y]+256;  System.out.print(tmp +" "); }


You understood my solution partially. I meant that

                 st = rs.getString("t1");
                 bytes = new byte[st.length()];
                 for (int y = 0; y < bytes.length; y++)                          bytes[y] = (byte) st.charAt(y);
                 String newst = new String(bytes);

Now, you print the contents of newst which will be definitely different from the bytes you printed provided you are in appropriate cyrillic code page in windows environment (control panel->Regional settings)

The catch is in the statement
                 String newst = new String(bytes);

Which converts a byte stream in platform encoding to Unicode. And really conversion happens and is not just putting a 0x00 in the most significant byte of each unicode character. i.e., it is not equivalent to what you did.
 
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228253
Q: does ODBC support real Unicode ?? as far as i remeber Win95 does not supports fully Unicode

No

To store unicode characters in database, you need to use a variant encoding of unicode called UTF-8.

read this article

http://www.microsoft.com/workshop/management/intl/unicode.asp

SQL server 7.0 supports unicode in a different way - it introduced new SQL data types for storing unicode strings.

Oracle, Sybase, informix, SQL Server, DB2 : all these databases support UTF-8 encoding which is a variable width variant of unicode. Java uses UTF-8 encoding when you serialize a string into a byte stream.

UTF-8 is a 1-6 bytes
variable width encoding of Unicode. The first byte indicates whether it is a one-byte, two byte, three byte or more bytes
character. ASCII characters are encoded as one-byte characters. Accented European characters are two byte characters. Asian
ideographs are encoded as three byte characters. Unlike UCS-2, UTF-8 does not have any null bytes. Data encoded in UTF-8
can be passed to C programs which cannot handle null bytes. Most of the development tools and IDE's doesnt support Unicode.
They rely on the native platform encoding.

for more info on internationalization stuff

visit
http://www.vijay.indianet.org/i18n


0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228254
:-)

maybe you don't see the common code

both of receive the byte data in different ways ....
your code
>>          st = rs.getString("t1");
>>          bytes = new byte[st.length()];
>>          for (int y = 0; y < bytes.length; y++)   bytes[y] = (byte) st.charAt(y);
my code
>>          i = rs.getAsciiStream("t1");
>>          bytes = new byte[100];
>>          t = i.read(bytes);

of course now, when you have the real data (in bytes array) you can put it in a String with
             String finalStr = new String(b);

best regard
  heyhey

0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228255
about ODBC + Unicode

- so where are the real Unicode symbols?
- does ODBC driver works with real Unicode ?
- what format is used between JDBC and ODBC ? (UTF - 8 ??)
>> each byte received from odbc is converted into a Unicode Character
JDBC takes one byte at a time and adds 0 byte to make it Unicode char ??? that means that database (ODBC) does not sends you Unicode symbols, but just some platform character set ???

it seems that we don't have to mess with Jdbc - Odbc encoding at all (so the problem is not in "JdbcOdbc Unicode-Database encoding") but just get the char data (hoping that it is in our platform format :-) and then translate it into Unicode symbols

>>    In my access database there are Hebrew chars (ascii 225-250)
just use rs.getAsciiStream() and you receive the exact ascii chars .... (225-250)
(or get the field value as byte stream, if you don't trust getAsciiStream())
and everything's OK ?

the Unicode translation is after you have received the pure byte data you just use
"
      String newst = new String(bytes);
      Which converts a byte stream in platform encoding to Unicode. ...
"

BTW thanks, evijay ... it seems that you know what are talking about :-)

best regards
 heyhey
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228256
Hi HeyHey,

I think you still didnt get the point. Please go through this lengthy explanation of character sets/encodings etc.,.

1.2 Charsets :

   A character set is a collection of predefined characters based on the specific needs of one or more languages without regard to
the encoding values used to represent the characters. The choice of which code set to use depends on the user's data processing
requirements.

    A particular character set can be encoded using different encoding schemes. For example, the ASCII character set defines the
set of characters found in the English language. Another character set for English is EBCDIC. The Japanese Industrial Standard
(JIS) character set defines the set of characters used in the Japanese language. Another popular character set for Japanese is
EUC-JP. Both the English and Japanese character sets can be encoded using different code sets.

     Each locale in the operating system defines which code set it uses and how the characters within the code set are
manipulated. Because multiple locales can be installed on the system, multiple code sets can be used by different users on the
system.

     Most computer systems applications  today use native character sets - i.e., character sets that are platform specific and
support a specific set of languages. For example, a common character set used on English based UNIX systems is ISO 8859-1
which supports Western European languages. On most of the Japaneese based Unix systems, Extended Unix Code (EUC) is the
char set employed. On Japaneese Windows NT, Shift-JIS is the charset employed.

    UNICODE is another universal char set which encompasses almost all the languages. There are again several encoding
formats for Unicode. UCS-2 (Universal Coded Character Set) is a fixed width (16 bit) encoding of Unicode. UTF-8 is a 1-6 bytes
variable width encoding of Unicode. The first byte indicates whether it is a one-byte, two byte, three byte or more bytes
character. ASCII characters are encoded as one-byte characters. Accented European characters are two byte characters. Asian
ideographs are encoded as three byte characters. Unlike UCS-2, UTF-8 does not have any null bytes. Data encoded in UTF-8
can be passed to C programs which cannot handle null bytes. Most of the development tools and IDE's doesnt support Unicode.
They rely on the native platform encoding.

Now, coming to your questions.


- so where are the real Unicode symbols?

What unicode symbols ?
 
                - does ODBC driver works with real Unicode ?
ODBC doesnt work with Unicode
 
               - what format is used between JDBC and ODBC ? (UTF - 8 ??)
Because of a bug in driver: No format is used. I mean you have a two byte java character - the first byte is ignored and only second byte is passed to ODBC driver (this is wrong and bad).


>> each byte received from odbc is converted into a Unicode Character
                JDBC takes one byte at a time and adds 0 byte to make it Unicode char ??? that means that database (ODBC) does not sends you Unicode symbols, but just
                some platform character set ???

yes. Absolutely true. It is the responsibility of JDBC-ODBC bridge to do that conversion which it is not doing. If you take the Oracle JDBC driver for oracle database, or Sybase jdbc driver for sybase database or informix jdbc driver for informix database provided by the vendors of these databases : all these does the conversions for you. You need not go thru this pain of doing conversion of the character data received from jdbc into bytes and then converting these bytes back into unicode string using platform charset encodings.
 
                it seems that we don't have to mess with Jdbc - Odbc encoding at all (so the problem is not in "JdbcOdbc Unicode-Database encoding") but just get the char             data (hoping that it is in our platform format :-) and then translate it into Unicode symbols
What you said is partially correct. We get strings from jdbc which doesnt contain unicode characters but contain platform charset bytes as characters (because of bug in jdbc-odbc driver). We cannot use them for processing or display. So we need to convert them ourself.



Think it like this.

You use ASCII as your encoding for english.
Suppose Java used instead of Unicode, EBCDIC as its string encoding. Now, the database data is always stored in ASCII. But in java, you see strings which are in EBCDIC. so, what you need is a conversion layer. Who should perform the conversion between ASCII<->EBCDIC when data gets passed between java and database ?
It is the JDBC-ODBC driver. But because of a but in it, it is not doing that. So what can we do to solve this?. Do the conversion Ourself.

The same is true with Unicode.

 String newst = new String(bytes);
                      Which converts a byte stream in platform encoding to Unicode. ...

is not just as simple as you imagine. It uses table lookups of large index tables to convert combination of bytes to unicode characters.


Converting data from Unicode to platform encodings and vice versa require a table lookup step. It is not as simple as putting the byte in least significant byte and 0 in the most significant byte of a 2 byte java unicode character. I explained this to you using the arabic character example.

Bye the Bye,

I really like the answers you gave for many questions in Experts Exchange. you seem to be a real guru in java. Hoping soon you will overtake us in Points.

I think internationalization is a field which is not clearly understood outside world. You need to understand the  character sets and how data is represented in various languages legacy encodings and how to convert them to unicode. The best place to start is NADINE KANO  book "Developing International Software For windows 95 and Windows NT" and this book is completely available online at http://www.microsoft.com/msdn




Regards
Vijay
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228257
evijay:

just one more small question (i think you have ready answer :-)

I understand when we use rs.getString(), we receive the actual (native) chars plus "a 0x00 in the most significant byte of each unicode character." (so this is not real Unicode symbol)

the question is
"Do rs.getAsciiStream() returns the same information (bytes / native chars) without adding '0' - or not ?"

best regard
  heyhey
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228258
The getASCIIStream implementation is still stupid than I can imagine. See the comments in the original source and you can understand yourself.


// Some sql types to stream types will need a data conversion.
// Determine the conversion type and calculate the size of the
// buffer needed.
//
// Example:
//
// Binary data (hex):          01 22 39
//
// Converted to ASCII (hex):   30 31 32 32 33 39
//
// Converted to UNICODE (hex): 00 30 00 31 00 32 00 32 00 33 00
39
// Ascii input stream from binary data requires each hex
// digit to be converted into 2 hex characters
So, this means,

if you have suppose string "123" stored in your database field,
with ascii stream, you get ascii stream like this

character    its ASCII Code point in hex value
1                 0x31
2                 0x32
3                 0x33

Now, convert each hex value int ascii by the stupid rule of the author of jdbc-odbc bridge like this.

0x31  has two hex digits in it i.e., 3 and 1. Convert these hex digits to their ascii equivalents -> 0x33 and 0x31

So, now, you will get the following final data
0x33 0x31 0x33 0x32 0x33 0x33

in getASCIIStream which is equal to
313233  !!


Vijay



0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228259
hmmmm

are you sure ?
have you tested this ?

when i have this word <"êèðèëèöà"> (cyrillic) in the database (Access97 :(

   st = rs.getString("t1");
       loaded 8 symbols
       63 63 63 63 63 63 63 63
 i = rs.getUnicodeStream("t2");
       loaded 16 symbols
       0 234 0 232 0 240 0 232 0 235 0 232 0 246 0 224 // stupid ASCII -> UNICODE conversion
i = rs.getAsciiStream("t3");
       loaded 8 symbols
       234 232 240 232 235 232 246 224 End

and as you can see getAsciiStream(); returns the exact bytes data from the database ...
maybe i should take a look at the sources ...
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228260
Hi HeyHey,

I went thru the source of JDK1.1.7 and am dam sure that this is happening. I dont know which version of JDK are you using. Please send me your email id so that I can post you a piece of that code which actually does this stupid conversion :-).

vijay

0
 
LVL 4

Expert Comment

by:evijay
ID: 1228261
here is the excerpt from the source which does that conversion

                        else if (convertType == CONVERT_ASCII) {
                                b[(i * 2) + 1] = (byte) digits.charAt (b[i] & 0x
0F);
                                b[(i * 2)] = (byte) digits.charAt ((b[i] >> 4) &
 0x0F);
                        }
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228262
>> I don't know which version of JDK are you using.
JDK1.1.7A - maybe it is a bad version :-), but this example worked with 1.1.4 / 1.1.5 too
>> Please send me your email id so that I can post you a piece of that code which actually does this stupid conversion :-).

evijay, i don't have problems with converting ... i solved the an year ago :-) - and i use .getAsciiStream() and it works for me ...
give me a week - when i got some free time i'll debug the sources and see what's happening (and why) ...
(i don't believe that these man are so stupid - the piece of code that you posted  looks just like the old packed decimal format -> string conversion)

best regards
  heyhey

P.S. my example works for me. why don't you just comile & run it ? it will be very interesting if you receive different results :-)
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228263
Hi heyHey,

You are right in your statement. When i went thru the code for the complete class and understood, i found that the author is converting the byte to two ascii hex values only when the database column datatype is binary. For character and other data types, he is passing the byte as it is to the caller.

sorry for that.

Vijay
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228264
so you agree that getAsciiStream obtains the real byte data ?
i.e.: the code i suggested

  i = rs.getAsciiStream("t1");
  bytes = new byte[100];
  t = i.read(bytes);

works like the code you posted
  st = rs.getString("t1");
  bytes = new byte[st.length()];
  for (int y = 0; y < bytes.length; y++)   bytes[y] = (byte) st.charAt(y);

(and which one do you think is better ? ;-)

best regards
  heyhey

p.s. thanks for all the information.
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228265
 
0
 
LVL 4

Expert Comment

by:evijay
ID: 1228266
i.e.: the code i suggested

               i = rs.getAsciiStream("t1");
               bytes = new byte[100];
               t = i.read(bytes);

             works like the code you posted
               st = rs.getString("t1");
               bytes = new byte[st.length()];
               for (int y = 0; y < bytes.length; y++)   bytes[y] = (byte) st.charAt(y);

             (and which one do you think is better ? ;-)


Let us look it performance wise. Tell me how costly is it to create a heavy object JdbcOdbcInputStream (which getASCIIStream()) returns and how costly is it to use a String. JdbcOdbcInputStream has a lot of member variables and it does some initialization stuff. If you are going to do this for each field and each row of database, tell me how many temporary objects you are creating and how efficient it is performance wise.

Also, dont forget our ultimate goal - "Get the unicode string from the field and not a byte array". after you get a array of bytes, you need to finally convert it back to a string (unicode offcourse). So, my idea was, the mapping logic can be nicely encapsulated in a small utility method of a global Utils class and helps the programmer a lot like

// routine to be used for every string returned by jdbc
// which may have non english characters
public String fixJDBCODBCString(String st)
{
     if (st == null)
           return st;    
     byte[] bytes = new byte[st.length()];
     for (int y = 0; y < bytes.length; y++)
             bytes[y] = (byte) st.charAt(y);
     String newst = new String(bytes);
     return newst;
}

// routine to be used for every string to be passed to  jdbcodbc
// which may have non english characters
String makeJDBCODBCString(String mystr)
{
    if (mystr == null) return mystr;
    byte[] b = mystr.getBytes();
    String finalstr = new String(b, 0);
    return finalstr;
}
0
 
LVL 16

Expert Comment

by:heyhey_
ID: 1228267
ok
:-)

0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Increment alphanumeric sequence 6 81
Systems talking to each other 5 110
Unable to start eclipse ? 17 87
backtracking recursion  code 19 41
An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
The viewer will learn how to implement Singleton Design Pattern in Java.
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now