Solved

Character set conversions in Database

Posted on 1998-07-30
8
369 Views
Last Modified: 2008-02-01
I have created a database in Access, which I access through the JDBC-ODBC bridge from a Java application. I have the following problem: Access seems to store the Greek characters as bytes using the encoding "Cp1253". I have managed to correctly convert these bytes to String and read the contents of the database. But I don't know how to convert the input Strings and store them to the database. I know that String.getBytes("Cp1253") will give me a correctly converted byte array, but how can this array be used in a SQL statement for a field that expects strings?
0
Comment
Question by:incah
  • 4
  • 3
8 Comments
 
LVL 5

Expert Comment

by:msmolyak
ID: 1229172
What happens if you simply insert the string into the database without converting it to byte array? What do you mean by describing how Access stores Greek characters? How does access know that those are Greek characters? In Java you can indeed convert between Unicode and byte array representation using encoding name. But can you specify encoding name dealing with Access? If not, the it will store whatever you send it. If the field accepts String you have to send a String, if it accepts binary data, you can send byte array, but it seems that it is up to you how to interpret the data.
0
 
LVL 4

Expert Comment

by:evijay
ID: 1229173
I am very sorry to disappoint you. The jdbc-odbc bridge is implemented in native code. In java, the characters are in UNICODE (2 BYTES PER CHAR). When a Native code (C is used), function is called, in the native code, the UNICODE Strings are converted to C  Strings by ignoring the upper byte of each of the two byte characters !!. So, if your greek encoding is 8 bit, then it is fine otherwise it will be neatly truncated (specially in case of japaneese).

You can try one thing. You can set your regional settings in ControlPanel -->RegionalSettings default locale to Greek and reboot and try running the program. This may work sometimes.

0
 

Author Comment

by:incah
ID: 1229174
The database contains records (with Greek characters in their fields) that were not inserted with my Java application, but with Access. These are not read correctly unless I read them as a byte array (of the encoding "Cp1253") and then convert it to String.

Now let's assume that I want to change the content of a field,no matter what this field initially contained.
I created a query like this:
query="UPDATE di00010 SET di0nam='"+t2.getText()+"' WHERE di0cod=1";
t2 is a JTextField and t2.getText() returns a Unicode String with the input of the user (this is in Greek). Please note that if I print this query to the standard output I don't see Greek characters correctly but this maybe expected behaviour, as the DOS boxes use the encoding "Cp737"
I expected (as you do too) that if I stored in the database a Unicode String and then got it back (with a SELECT statement), I would take the same string that I stored. But this is not happening. I get garbage instead. So I assumed that an implicit conversion is done in the background when the SQL UPDATE statement is executed. And the only logical conversion here would be to the Windows default encoding of the system used, ie to "Cp1253".

I can't tell you what would have happened if I inserted the records directly from the Java app. The execution of an "INSERT INTO" statement always fails in my system and I don't yet know why.
0
 

Author Comment

by:incah
ID: 1229175
To evijay:

As you can see in my previous comment I imagined that something like this was happening. That's why I need to do the conversion myself and store a byte array of the correct encoding in the database.But how can I use a byte array in a SELECT statement?
Or should I change the field type of the database to some other type than text?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 4

Expert Comment

by:evijay
ID: 1229176
Please note that greek characters in Unicode take a range from 0 thru 0x300.

The mapping which should be used for converting a Unicode greek character to a 8 bit CP1253 character is

        "\u20AC\uFFFD\u201A\u0192\u201E\u2026\u2020\u2021" +       // 0x80 - 0x87
        "\uFFFD\u2030\uFFFD\u2039\uFFFD\uFFFD\uFFFD\uFFFD" +       // 0x88 - 0x8F
        "\uFFFD\u2018\u2019\u201C\u201D\u2022\u2013\u2014" +       // 0x90 - 0x97
        "\uFFFD\u2122\uFFFD\u203A\uFFFD\uFFFD\uFFFD\uFFFD" +       // 0x98 - 0x9F
        "\u00A0\u0385\u0386\u00A3\u00A4\u00A5\u00A6\u00A7" +       // 0xA0 - 0xA7
        "\u00A8\u00A9\uFFFD\u00AB\u00AC\u00AD\u00AE\u2015" +       // 0xA8 - 0xAF
        "\u00B0\u00B1\u00B2\u00B3\u0384\u00B5\u00B6\u00B7" +       // 0xB0 - 0xB7
        "\u0388\u0389\u038A\u00BB\u038C\u00BD\u038E\u038F" +       // 0xB8 - 0xBF
        "\u0390\u0391\u0392\u0393\u0394\u0395\u0396\u0397" +       // 0xC0 - 0xC7
        "\u0398\u0399\u039A\u039B\u039C\u039D\u039E\u039F" +       // 0xC8 - 0xCF
        "\u03A0\u03A1\uFFFD\u03A3\u03A4\u03A5\u03A6\u03A7" +       // 0xD0 - 0xD7
        "\u03A8\u03A9\u03AA\u03AB\u03AC\u03AD\u03AE\u03AF" +       // 0xD8 - 0xDF
        "\u03B0\u03B1\u03B2\u03B3\u03B4\u03B5\u03B6\u03B7" +       // 0xE0 - 0xE7
        "\u03B8\u03B9\u03BA\u03BB\u03BC\u03BD\u03BE\u03BF" +       // 0xE8 - 0xEF
        "\u03C0\u03C1\u03C2\u03C3\u03C4\u03C5\u03C6\u03C7" +       // 0xF0 - 0xF7
        "\u03C8\u03C9\u03CA\u03CB\u03CC\u03CD\u03CE\uFFFD" +       // 0xF8 - 0xFF
        "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007" +       // 0x00 - 0x07
        "\b\t\n\u000B\f\r\u000E\u000F" +       // 0x08 - 0x0F
        "\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017" +       // 0x10 - 0x17
        "\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F" +       // 0x18 - 0x1F
        "\u0020\u0021\"\u0023\u0024\u0025\u0026\'" +       // 0x20 - 0x27
        "\u0028\u0029\u002A\u002B\u002C\u002D\u002E\u002F" +       // 0x28 - 0x2F
        "\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" +       // 0x30 - 0x37
        "\u0038\u0039\u003A\u003B\u003C\u003D\u003E\u003F" +       // 0x38 - 0x3F
        "\u0040\u0041\u0042\u0043\u0044\u0045\u0046\u0047" +       // 0x40 - 0x47
        "\u0048\u0049\u004A\u004B\u004C\u004D\u004E\u004F" +       // 0x48 - 0x4F
        "\u0050\u0051\u0052\u0053\u0054\u0055\u0056\u0057" +       // 0x50 - 0x57
        "\u0058\u0059\u005A\u005B\\\u005D\u005E\u005F" +       // 0x58 - 0x5F
        "\u0060\u0061\u0062\u0063\u0064\u0065\u0066\u0067" +       // 0x60 - 0x67
        "\u0068\u0069\u006A\u006B\u006C\u006D\u006E\u006F" +       // 0x68 - 0x6F
        "\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" +       // 0x70 - 0x77
        "\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F";       // 0x78 - 0x7F

This should be used by the odbc driver while doing the conversion from java string to c string.  The problem is, in JNI we have two functions

makeCstring and makePlatformCString and as I notice, the JDBC-ODBC Bridge is using the former function (makeCString) which simply ignores the upper byte of unicode character.  so, unicode greek character \u0192 will be converted into 0x92 in C code, and upper byte is ignored. This is bad and this occurs for every function which passes strings from java to native code. It would have been nice if they have used makePlatformCString instead of makeCString.
As almost all the  JDBC Statement takes string argument, the default JDBC-ODBC Bridge will not help.

The solution is that you have to use a thirdparty JDBC-ODBC bridge say (www.weblogic.com JDBC/KONA or JDirect) etc.,. in which you can specify to the driver, the charaset in which the database is.!!

0
 

Author Comment

by:incah
ID: 1229177
To evijay:

Thanks to the map you gave me, I managed to do the conversion myself by substituting the Greek characters with a character between 0x80-0xFF (so that the first byte is always 0x00), write the data to the database, and then retrieve them and do the reverse
substitution. This worked !!!

Please post an an answer (anything) and I will grade it.

PS: Where do you find information like these you gave me?


0
 
LVL 4

Accepted Solution

by:
evijay earned 150 total points
ID: 1229178
Now, jdbc-odbc bridge - the latest one supports charset property

See this abstract from the webpage http://java.sun.com/products/jdk/1.2/docs/guide/jdbc/bridge.html

What's new with the JDBC-ODBC Bridge?

    The Bridge has been re-implemented using the Java Native Interface API, for improved performance and stability.

    A jdbc:odbc: connection can now have a charSet property, to specify a Character Encoding Scheme other than the client default. For possible values,
    see the JDK 1.1 Internationalizaton specification on the Javasoft Web Site.

    The Bridge now assumes that ODBC drivers are able to handle multi-threaded access. This will improve performance. If you need to use multi-threading and
    your ODBC package does not support it, your client program will have to implement locking.



0
 

Author Comment

by:incah
ID: 1229179
Thanks again!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
basic hardware to learn oop advanced design patterns 3 88
JUnit 4 @Before and @BeforeClass differences 3 48
stackato and cloud 4 73
Java SE 8u111  Lot of stuff broke 11 54
For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Video by: Michael
Viewers learn about how to reduce the potential repetitiveness of coding in main by developing methods to perform specific tasks for their program. Additionally, objects are introduced for the purpose of learning how to call methods in Java. Define …
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now