Solved

Character set conversions in Database

Posted on 1998-07-30
8
368 Views
Last Modified: 2008-02-01
I have created a database in Access, which I access through the JDBC-ODBC bridge from a Java application. I have the following problem: Access seems to store the Greek characters as bytes using the encoding "Cp1253". I have managed to correctly convert these bytes to String and read the contents of the database. But I don't know how to convert the input Strings and store them to the database. I know that String.getBytes("Cp1253") will give me a correctly converted byte array, but how can this array be used in a SQL statement for a field that expects strings?
0
Comment
Question by:incah
  • 4
  • 3
8 Comments
 
LVL 5

Expert Comment

by:msmolyak
ID: 1229172
What happens if you simply insert the string into the database without converting it to byte array? What do you mean by describing how Access stores Greek characters? How does access know that those are Greek characters? In Java you can indeed convert between Unicode and byte array representation using encoding name. But can you specify encoding name dealing with Access? If not, the it will store whatever you send it. If the field accepts String you have to send a String, if it accepts binary data, you can send byte array, but it seems that it is up to you how to interpret the data.
0
 
LVL 4

Expert Comment

by:evijay
ID: 1229173
I am very sorry to disappoint you. The jdbc-odbc bridge is implemented in native code. In java, the characters are in UNICODE (2 BYTES PER CHAR). When a Native code (C is used), function is called, in the native code, the UNICODE Strings are converted to C  Strings by ignoring the upper byte of each of the two byte characters !!. So, if your greek encoding is 8 bit, then it is fine otherwise it will be neatly truncated (specially in case of japaneese).

You can try one thing. You can set your regional settings in ControlPanel -->RegionalSettings default locale to Greek and reboot and try running the program. This may work sometimes.

0
 

Author Comment

by:incah
ID: 1229174
The database contains records (with Greek characters in their fields) that were not inserted with my Java application, but with Access. These are not read correctly unless I read them as a byte array (of the encoding "Cp1253") and then convert it to String.

Now let's assume that I want to change the content of a field,no matter what this field initially contained.
I created a query like this:
query="UPDATE di00010 SET di0nam='"+t2.getText()+"' WHERE di0cod=1";
t2 is a JTextField and t2.getText() returns a Unicode String with the input of the user (this is in Greek). Please note that if I print this query to the standard output I don't see Greek characters correctly but this maybe expected behaviour, as the DOS boxes use the encoding "Cp737"
I expected (as you do too) that if I stored in the database a Unicode String and then got it back (with a SELECT statement), I would take the same string that I stored. But this is not happening. I get garbage instead. So I assumed that an implicit conversion is done in the background when the SQL UPDATE statement is executed. And the only logical conversion here would be to the Windows default encoding of the system used, ie to "Cp1253".

I can't tell you what would have happened if I inserted the records directly from the Java app. The execution of an "INSERT INTO" statement always fails in my system and I don't yet know why.
0
 

Author Comment

by:incah
ID: 1229175
To evijay:

As you can see in my previous comment I imagined that something like this was happening. That's why I need to do the conversion myself and store a byte array of the correct encoding in the database.But how can I use a byte array in a SELECT statement?
Or should I change the field type of the database to some other type than text?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 4

Expert Comment

by:evijay
ID: 1229176
Please note that greek characters in Unicode take a range from 0 thru 0x300.

The mapping which should be used for converting a Unicode greek character to a 8 bit CP1253 character is

        "\u20AC\uFFFD\u201A\u0192\u201E\u2026\u2020\u2021" +       // 0x80 - 0x87
        "\uFFFD\u2030\uFFFD\u2039\uFFFD\uFFFD\uFFFD\uFFFD" +       // 0x88 - 0x8F
        "\uFFFD\u2018\u2019\u201C\u201D\u2022\u2013\u2014" +       // 0x90 - 0x97
        "\uFFFD\u2122\uFFFD\u203A\uFFFD\uFFFD\uFFFD\uFFFD" +       // 0x98 - 0x9F
        "\u00A0\u0385\u0386\u00A3\u00A4\u00A5\u00A6\u00A7" +       // 0xA0 - 0xA7
        "\u00A8\u00A9\uFFFD\u00AB\u00AC\u00AD\u00AE\u2015" +       // 0xA8 - 0xAF
        "\u00B0\u00B1\u00B2\u00B3\u0384\u00B5\u00B6\u00B7" +       // 0xB0 - 0xB7
        "\u0388\u0389\u038A\u00BB\u038C\u00BD\u038E\u038F" +       // 0xB8 - 0xBF
        "\u0390\u0391\u0392\u0393\u0394\u0395\u0396\u0397" +       // 0xC0 - 0xC7
        "\u0398\u0399\u039A\u039B\u039C\u039D\u039E\u039F" +       // 0xC8 - 0xCF
        "\u03A0\u03A1\uFFFD\u03A3\u03A4\u03A5\u03A6\u03A7" +       // 0xD0 - 0xD7
        "\u03A8\u03A9\u03AA\u03AB\u03AC\u03AD\u03AE\u03AF" +       // 0xD8 - 0xDF
        "\u03B0\u03B1\u03B2\u03B3\u03B4\u03B5\u03B6\u03B7" +       // 0xE0 - 0xE7
        "\u03B8\u03B9\u03BA\u03BB\u03BC\u03BD\u03BE\u03BF" +       // 0xE8 - 0xEF
        "\u03C0\u03C1\u03C2\u03C3\u03C4\u03C5\u03C6\u03C7" +       // 0xF0 - 0xF7
        "\u03C8\u03C9\u03CA\u03CB\u03CC\u03CD\u03CE\uFFFD" +       // 0xF8 - 0xFF
        "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007" +       // 0x00 - 0x07
        "\b\t\n\u000B\f\r\u000E\u000F" +       // 0x08 - 0x0F
        "\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017" +       // 0x10 - 0x17
        "\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F" +       // 0x18 - 0x1F
        "\u0020\u0021\"\u0023\u0024\u0025\u0026\'" +       // 0x20 - 0x27
        "\u0028\u0029\u002A\u002B\u002C\u002D\u002E\u002F" +       // 0x28 - 0x2F
        "\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" +       // 0x30 - 0x37
        "\u0038\u0039\u003A\u003B\u003C\u003D\u003E\u003F" +       // 0x38 - 0x3F
        "\u0040\u0041\u0042\u0043\u0044\u0045\u0046\u0047" +       // 0x40 - 0x47
        "\u0048\u0049\u004A\u004B\u004C\u004D\u004E\u004F" +       // 0x48 - 0x4F
        "\u0050\u0051\u0052\u0053\u0054\u0055\u0056\u0057" +       // 0x50 - 0x57
        "\u0058\u0059\u005A\u005B\\\u005D\u005E\u005F" +       // 0x58 - 0x5F
        "\u0060\u0061\u0062\u0063\u0064\u0065\u0066\u0067" +       // 0x60 - 0x67
        "\u0068\u0069\u006A\u006B\u006C\u006D\u006E\u006F" +       // 0x68 - 0x6F
        "\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" +       // 0x70 - 0x77
        "\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F";       // 0x78 - 0x7F

This should be used by the odbc driver while doing the conversion from java string to c string.  The problem is, in JNI we have two functions

makeCstring and makePlatformCString and as I notice, the JDBC-ODBC Bridge is using the former function (makeCString) which simply ignores the upper byte of unicode character.  so, unicode greek character \u0192 will be converted into 0x92 in C code, and upper byte is ignored. This is bad and this occurs for every function which passes strings from java to native code. It would have been nice if they have used makePlatformCString instead of makeCString.
As almost all the  JDBC Statement takes string argument, the default JDBC-ODBC Bridge will not help.

The solution is that you have to use a thirdparty JDBC-ODBC bridge say (www.weblogic.com JDBC/KONA or JDirect) etc.,. in which you can specify to the driver, the charaset in which the database is.!!

0
 

Author Comment

by:incah
ID: 1229177
To evijay:

Thanks to the map you gave me, I managed to do the conversion myself by substituting the Greek characters with a character between 0x80-0xFF (so that the first byte is always 0x00), write the data to the database, and then retrieve them and do the reverse
substitution. This worked !!!

Please post an an answer (anything) and I will grade it.

PS: Where do you find information like these you gave me?


0
 
LVL 4

Accepted Solution

by:
evijay earned 150 total points
ID: 1229178
Now, jdbc-odbc bridge - the latest one supports charset property

See this abstract from the webpage http://java.sun.com/products/jdk/1.2/docs/guide/jdbc/bridge.html

What's new with the JDBC-ODBC Bridge?

    The Bridge has been re-implemented using the Java Native Interface API, for improved performance and stability.

    A jdbc:odbc: connection can now have a charSet property, to specify a Character Encoding Scheme other than the client default. For possible values,
    see the JDK 1.1 Internationalizaton specification on the Javasoft Web Site.

    The Bridge now assumes that ODBC drivers are able to handle multi-threaded access. This will improve performance. If you need to use multi-threading and
    your ODBC package does not support it, your client program will have to implement locking.



0
 

Author Comment

by:incah
ID: 1229179
Thanks again!
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
copyEndy  challenge 15 58
countX 22 71
changeXy challenge 13 58
Modeling a class in java 5 34
After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now