Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 386
  • Last Modified:

Character set conversions in Database

I have created a database in Access, which I access through the JDBC-ODBC bridge from a Java application. I have the following problem: Access seems to store the Greek characters as bytes using the encoding "Cp1253". I have managed to correctly convert these bytes to String and read the contents of the database. But I don't know how to convert the input Strings and store them to the database. I know that String.getBytes("Cp1253") will give me a correctly converted byte array, but how can this array be used in a SQL statement for a field that expects strings?
0
incah
Asked:
incah
  • 4
  • 3
1 Solution
 
msmolyakCommented:
What happens if you simply insert the string into the database without converting it to byte array? What do you mean by describing how Access stores Greek characters? How does access know that those are Greek characters? In Java you can indeed convert between Unicode and byte array representation using encoding name. But can you specify encoding name dealing with Access? If not, the it will store whatever you send it. If the field accepts String you have to send a String, if it accepts binary data, you can send byte array, but it seems that it is up to you how to interpret the data.
0
 
evijayCommented:
I am very sorry to disappoint you. The jdbc-odbc bridge is implemented in native code. In java, the characters are in UNICODE (2 BYTES PER CHAR). When a Native code (C is used), function is called, in the native code, the UNICODE Strings are converted to C  Strings by ignoring the upper byte of each of the two byte characters !!. So, if your greek encoding is 8 bit, then it is fine otherwise it will be neatly truncated (specially in case of japaneese).

You can try one thing. You can set your regional settings in ControlPanel -->RegionalSettings default locale to Greek and reboot and try running the program. This may work sometimes.

0
 
incahAuthor Commented:
The database contains records (with Greek characters in their fields) that were not inserted with my Java application, but with Access. These are not read correctly unless I read them as a byte array (of the encoding "Cp1253") and then convert it to String.

Now let's assume that I want to change the content of a field,no matter what this field initially contained.
I created a query like this:
query="UPDATE di00010 SET di0nam='"+t2.getText()+"' WHERE di0cod=1";
t2 is a JTextField and t2.getText() returns a Unicode String with the input of the user (this is in Greek). Please note that if I print this query to the standard output I don't see Greek characters correctly but this maybe expected behaviour, as the DOS boxes use the encoding "Cp737"
I expected (as you do too) that if I stored in the database a Unicode String and then got it back (with a SELECT statement), I would take the same string that I stored. But this is not happening. I get garbage instead. So I assumed that an implicit conversion is done in the background when the SQL UPDATE statement is executed. And the only logical conversion here would be to the Windows default encoding of the system used, ie to "Cp1253".

I can't tell you what would have happened if I inserted the records directly from the Java app. The execution of an "INSERT INTO" statement always fails in my system and I don't yet know why.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
incahAuthor Commented:
To evijay:

As you can see in my previous comment I imagined that something like this was happening. That's why I need to do the conversion myself and store a byte array of the correct encoding in the database.But how can I use a byte array in a SELECT statement?
Or should I change the field type of the database to some other type than text?
0
 
evijayCommented:
Please note that greek characters in Unicode take a range from 0 thru 0x300.

The mapping which should be used for converting a Unicode greek character to a 8 bit CP1253 character is

        "\u20AC\uFFFD\u201A\u0192\u201E\u2026\u2020\u2021" +       // 0x80 - 0x87
        "\uFFFD\u2030\uFFFD\u2039\uFFFD\uFFFD\uFFFD\uFFFD" +       // 0x88 - 0x8F
        "\uFFFD\u2018\u2019\u201C\u201D\u2022\u2013\u2014" +       // 0x90 - 0x97
        "\uFFFD\u2122\uFFFD\u203A\uFFFD\uFFFD\uFFFD\uFFFD" +       // 0x98 - 0x9F
        "\u00A0\u0385\u0386\u00A3\u00A4\u00A5\u00A6\u00A7" +       // 0xA0 - 0xA7
        "\u00A8\u00A9\uFFFD\u00AB\u00AC\u00AD\u00AE\u2015" +       // 0xA8 - 0xAF
        "\u00B0\u00B1\u00B2\u00B3\u0384\u00B5\u00B6\u00B7" +       // 0xB0 - 0xB7
        "\u0388\u0389\u038A\u00BB\u038C\u00BD\u038E\u038F" +       // 0xB8 - 0xBF
        "\u0390\u0391\u0392\u0393\u0394\u0395\u0396\u0397" +       // 0xC0 - 0xC7
        "\u0398\u0399\u039A\u039B\u039C\u039D\u039E\u039F" +       // 0xC8 - 0xCF
        "\u03A0\u03A1\uFFFD\u03A3\u03A4\u03A5\u03A6\u03A7" +       // 0xD0 - 0xD7
        "\u03A8\u03A9\u03AA\u03AB\u03AC\u03AD\u03AE\u03AF" +       // 0xD8 - 0xDF
        "\u03B0\u03B1\u03B2\u03B3\u03B4\u03B5\u03B6\u03B7" +       // 0xE0 - 0xE7
        "\u03B8\u03B9\u03BA\u03BB\u03BC\u03BD\u03BE\u03BF" +       // 0xE8 - 0xEF
        "\u03C0\u03C1\u03C2\u03C3\u03C4\u03C5\u03C6\u03C7" +       // 0xF0 - 0xF7
        "\u03C8\u03C9\u03CA\u03CB\u03CC\u03CD\u03CE\uFFFD" +       // 0xF8 - 0xFF
        "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007" +       // 0x00 - 0x07
        "\b\t\n\u000B\f\r\u000E\u000F" +       // 0x08 - 0x0F
        "\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017" +       // 0x10 - 0x17
        "\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F" +       // 0x18 - 0x1F
        "\u0020\u0021\"\u0023\u0024\u0025\u0026\'" +       // 0x20 - 0x27
        "\u0028\u0029\u002A\u002B\u002C\u002D\u002E\u002F" +       // 0x28 - 0x2F
        "\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" +       // 0x30 - 0x37
        "\u0038\u0039\u003A\u003B\u003C\u003D\u003E\u003F" +       // 0x38 - 0x3F
        "\u0040\u0041\u0042\u0043\u0044\u0045\u0046\u0047" +       // 0x40 - 0x47
        "\u0048\u0049\u004A\u004B\u004C\u004D\u004E\u004F" +       // 0x48 - 0x4F
        "\u0050\u0051\u0052\u0053\u0054\u0055\u0056\u0057" +       // 0x50 - 0x57
        "\u0058\u0059\u005A\u005B\\\u005D\u005E\u005F" +       // 0x58 - 0x5F
        "\u0060\u0061\u0062\u0063\u0064\u0065\u0066\u0067" +       // 0x60 - 0x67
        "\u0068\u0069\u006A\u006B\u006C\u006D\u006E\u006F" +       // 0x68 - 0x6F
        "\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" +       // 0x70 - 0x77
        "\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F";       // 0x78 - 0x7F

This should be used by the odbc driver while doing the conversion from java string to c string.  The problem is, in JNI we have two functions

makeCstring and makePlatformCString and as I notice, the JDBC-ODBC Bridge is using the former function (makeCString) which simply ignores the upper byte of unicode character.  so, unicode greek character \u0192 will be converted into 0x92 in C code, and upper byte is ignored. This is bad and this occurs for every function which passes strings from java to native code. It would have been nice if they have used makePlatformCString instead of makeCString.
As almost all the  JDBC Statement takes string argument, the default JDBC-ODBC Bridge will not help.

The solution is that you have to use a thirdparty JDBC-ODBC bridge say (www.weblogic.com JDBC/KONA or JDirect) etc.,. in which you can specify to the driver, the charaset in which the database is.!!

0
 
incahAuthor Commented:
To evijay:

Thanks to the map you gave me, I managed to do the conversion myself by substituting the Greek characters with a character between 0x80-0xFF (so that the first byte is always 0x00), write the data to the database, and then retrieve them and do the reverse
substitution. This worked !!!

Please post an an answer (anything) and I will grade it.

PS: Where do you find information like these you gave me?


0
 
evijayCommented:
Now, jdbc-odbc bridge - the latest one supports charset property

See this abstract from the webpage http://java.sun.com/products/jdk/1.2/docs/guide/jdbc/bridge.html

What's new with the JDBC-ODBC Bridge?

    The Bridge has been re-implemented using the Java Native Interface API, for improved performance and stability.

    A jdbc:odbc: connection can now have a charSet property, to specify a Character Encoding Scheme other than the client default. For possible values,
    see the JDK 1.1 Internationalizaton specification on the Javasoft Web Site.

    The Bridge now assumes that ODBC drivers are able to handle multi-threaded access. This will improve performance. If you need to use multi-threading and
    your ODBC package does not support it, your client program will have to implement locking.



0
 
incahAuthor Commented:
Thanks again!
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now