Link to home
Start Free TrialLog in
Avatar of nahkov
nahkov

asked on

MySQL and Hebrew

First let me say: Thank you for your time and I will appreciate any help as I am quite desperate

This problem relates specifically to using hebrew in MySQL 4.1

I developed a pure Access application (i.e. both front and back end in Access 2002) and implemented it across the clinics in our organization. The Access system has been in place for about a year and is thankfully working very well.
 
I decided to convert the backend over to MySQL (4.1.7, on Windows 2000 Server, MySQL ODBC Driver 3.51) as I was worried that some of the clinics were getting to the point of overloading Access's capabilities (file size was around 500 meg in one of the clinics and there were the rare corruptions).
 
Using DBManager Professional (www.dbtools.com.br) to convert over the data structures of the tables and my own software to do the actual data transfer from Access to the new MySQL tables (basically justa  series of INSERT statements), I created a test version of my database.  The front end continued to be in Access 2002 and used linked tables through the ODBC driver to access the data.
 
The first hebrew related issue that came up was a collation sequence problem when I tried to UNION two tables. The problem was with the following query:
 
SELECT Patient_Name, Patient_Type FROM tbl_Regular_Patients
UNION
SELECT Patient_Name, "[a string in hebrew appears here]" FROM tbl_Special_Patients
 
This example would crash out with an error message related to incompatible collation sequences. I fixed it by wrapping the field in a CSTR function call (in the Access frontend):
 
SELECT Patient_Name, Cstr(Patient_Type) FROM tbl_Regular_Patients
UNION
SELECT Patient_Name, "[a string in hebrew appears here]"  FROM tbl_Special_Patients
 
So, I realized that the hebrew strings as stored in MySQL were as such that Access could not work with them. But once I used the CSTR function, Access was able to convert the strings and fields being UNIONed into compatible formats.

I have checked the regional settings on my server and they all appear to be appropriate. I am able to use Notepad and Microsoft Office in hebrew without a problem.

The character set for the MySQL database is utf8. Actually though, this is also strange and perhaps part of the problem. Even after making changes in the my.ini file to get the default characer set and collation to be utf8 (and utf8_collation_gi), if I then run a "show variables" query, a number of the character set variables are still in latin (as in one of the character collations). I assume that this is all inter-related

Anyways, I thought this "trick" with the Cstr call solved everything and after further tests, I finally installed the first MySQL based system at one of our clinics. It has been in use a couple of days now and today I discovered a new problem. I had failed to check during the testing phase some of the lookup tables I use. I have one table that is very simple. It is a conversion table of the hebrew alphabet to the numerical equivalent of each letter. So,
 
-------------------------------------------
|HLetter  | HNumValue | ts (timestamp field)|
-------------------------------------------
 
HLetter is a unique indexed field and HNumValue is the primary key.
 
When I start entering data (using DB Manager Professional, NOT Access), I get a problem when I enter the second line.
 
The first line is the hebrew letter aleph for HLetter and then 1 for HNumValue
The second line is the hebrew letter beit and then 2. But when I try to move to the next line after this 2nd line, I get the following error message: "Duplicate entry [some unintelligible characters] for key 3."
 
I played with this and realized that many of the hebrew letters that I tried entering in the HLetter field were being read as identical (in terms of the primary index refusing to accept them).

I then tried a simple SELECT statement for doing a sort (again through DB Manager Professional):

SELECT Last_Name FROM tbl_Patients ORDER By Last_Name.

The result of this was a confused sort order. When I went into Access and ran the same query, I also got the confused sort until I changed the SQL to:

SELECT Last_Name FROM tbl_Patients ORDER By CSTR(Last_Name) - my old friend CSTR worked again!. SO I know that my data in the MySQL is not corrupted but that there is a fundamental problem with the indexing and sort order (which are the same problem?)
 
I am quite desperate as the MySQL version of the system is running live. Fortunately, these lookup table issues are NOT crashing the system but they do make certain functions impossible (or unreliable).


Avatar of crimson117
crimson117

This is a very rare kind of question - I'm sure most of us have never tried using Hebrew in mySql.

It you find a problem in sorting a simple table using a hebrew pk, but it works correctly if you do the same in english, then it is likely a bug in mySql and not a configuration problem.

I suggest submitting this as a bug (or bugs) to the mySql team at http://bugs.mysql.com.

Your other issues may be bugs as well, so you may have more than one bug report.
Avatar of nahkov

ASKER

Thank you for your comments. I am sorry that I did not see this response earlier. I actually stopped checking back after the first week. My mistake.

I have found a solution that is acceptable. In fact, because this solution works, I also now believe that there is a bug specifically in the collating sequence of the UTF-8 specification, specifically in the hebrew (i.e. not necessarily any other languages).

The solution: to use the utf8-bin collation. Since the bin option treats everything simply as its unicode for the purpose of sorting and indexing, it works. The only problem is that this collation is case sensitive. I had to modify my code to make sure that english entries are always in upper case (I could have standardized on lower case but preferred upper case because CAPS lock will always provide english letters even when Windows is in hebrew mode)

Thank you again

P.S. In the course of this project, I came across a number of hebrew issues (such as when I needed to combine MySQL, .NET and hebrew). I hope to take some time off and write a long description of my personal "trek" through the Access-to-MySQL transition. Hope it will help others. Quick note: hebrew does not readily work through the .NET connector. You need to use the standard ODBC connector and MyODBC (which needs to be installed on the server)
ASKER CERTIFIED SOLUTION
Avatar of Computer101
Computer101
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial