Link to home
Start Free TrialLog in
Avatar of derrida
derrida

asked on

hebrew database problem

hi
i have a database that holds hebrew information.
i have a php page that should output the info. if i write hebrew directly in the page it works (the page is set to utf-8), but when the info comes from the database i get question marks.
i have tried to change the database encoding with: utf_generak_ci, utf_bin,utf_unicode_ci (i`m working with phpmyadmin). i even tried strictly hebrew.
nothing works.
can anyone please help?

best regards

ron
Avatar of Yuval_Shohat
Yuval_Shohat
Flag of Israel image

dont know about the data already in the db but lets try something.
after openning the connection run
set names 'hebrew'
as your first query.

as reguards to the table, i'd recreate the table (not modify it) using utf8_bin as its default encodeng.

if your php admin givs you hebrew as it should be check the encoding connection used by the phpadin and use it.

-=Yuval=-
Avatar of derrida
derrida

ASKER

hi yuval
thanks for answring.
i tried to create a new db with one table with utf8_bin. it still does not work.
can you please let me know how to set the db to utf8 as a query?

best regrads

ron
Hi,
in parallel with Yuval's suggestions, you need to clarify what is currently happening: maybe all is fine and you 'just' have a display problem....

I would suggest some simple checks with your current situation:
1 - When you get questions marks displayed in your browser, experiment with character encoding in your browser.
a - what is the current encoding? (to get it with IE: 'Display' (my IE in French says 'Affichage' - if you are using IE7 you might need to first use 'Tools' / 'Menu bar'), then 'encoding' (French : 'codage').
My browser currently displays 'Automatic Selection' / 'Unicode UTF-8' / 'Left to right'.
Suggested tracks for experimentation:
= Automatic selected or not,
= UTF-8
  'Windows Occidental' (not sure, in French: 'Alphabet Occidental (Windows)' )
  'ISO Occidental' ('Alphabet Occidental (ISO)' )
  'Others' and then the 4 versions of Hebrew

Yes, I know, this implies 2  (automatic or not)  x 7 (characters sets) = 14 combinations (in fact, you're already running one of them). But in 10 minutes max you will have checked that your pages does not display right at all.
KEEP WRITTEN NOTES of what works or not
If it displays fine (or more precisely: the text you display from the db displays fine, even if everything else is wrong!), then you know than you 'just' have a 'tuning problem'.

Before going further, you need an extra investigation using a RECENT version of phpmyadmin (PMA): look at your db tables content asking PMA to display it, and ask PMA to do that with the different character codes.
While you are in PMA, EVEN if everything was wrong, do another run of expermientation.
- selecting hebrew in PMA, use PMA to create/ modify content in one of your tables by directly typing Hebrew text.
- do the same with UTF-8 in PMA, in another record of the same table
Now display these 2 records to check the results: first display in PMA-UTF8, then in PMA-Hebrew.
KEEP NOTES of the results, specially each of the 2x2=4 different combinations

You should now have a pretty fine picture of what works and not: this will help you to decide which PHP functions to use.
In a similar situation, I found another problem: the existing DB was in an inconsistent state, with some text in UTF-8 and some not... after lots of nightmares, I found a solution that helped me to understand/ check what was really happening with the data and then helped me to edit the data that needed it: getting a string from the DB, I displayed it 'as is', 'encoded TO UTF-8' (in my case), and 'decoded FROM UTF-8'. I would suggest you do it with the additional encoded to/ decoded from Hebrew.
This is slightly overkill... but it will not take that long a time and will help you to debug very fast.

2 Last comments:
1 - in order for input characters to display correctly, you have a succession of input / store / extract / display where each of these steps can have 2 to 4 different settings... that's why it is really important to keep notes of what you have tested
2 - If you are creating a table by a CSV import, BEWARE. Add in each table an extra record which holds encoded characters, so that you are sure that the import does translate correctly all the records and creates the correct field information.
hi Ron.
try this:


set names 'hebrew'
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;

/*Table structure for table `actiondatetime` */

DROP TABLE IF EXISTS `actiondatetime`;

CREATE TABLE `actiondatetime` (
  `ActionId` int(10) unsigned NOT NULL auto_increment,
  `ActionName` varchar(50) character set utf8 collate utf8_bin NOT NULL,
  `Start` datetime default NULL,
  `End` datetime default NULL,
  `lft` int(10) unsigned NOT NULL,
  `rght` int(10) unsigned NOT NULL,
  UNIQUE KEY `ActionId` (`ActionId`),
  UNIQUE KEY `lft` (`lft`),
  UNIQUE KEY `rght` (`rght`),
  KEY `Name` (`ActionName`),
  KEY `timeindx` (`Start`,`End`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

/*Data for the table `actiondatetime` */

insert  into `actiondatetime`(`ActionId`,`ActionName`,`Start`,`End`,`lft`,`rght`) values (1,'ê×ÙÜê èÙæê ÞéÐÑê ÞÙÓâ','2007-08-04 11:47:05',NULL,1,12),(2,'ØâÙàê çÕÑå ×ÑèÕê ÕàÙÙèÕê','2007-08-04 11:47:30','2007-06-09 23:44:32',4,9),(3,'ÞÙÖÕÒ ×ÑèÕê ÜàÙÙèÕê','2007-08-04 11:47:58',NULL,5,8),(4,'áèÙçê ÓäÙ ×ÑèÕê','2007-08-04 11:47:59',NULL,6,7),(5,'ÛàÙáÔ ÜÐêè','2007-08-04 11:47:05',NULL,2,11),(6,'àÙêÕ× ÓäÙÝ ÜÐÙêÕè çÕÑå ×ÑèÕê ÕàÙÙèÕê âèÚ','2007-08-04 11:47:30',NULL,3,10);

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;

this is just a simple table where the column of the varchar is set to utg8 bin.
i have added some hebrew data as well. try to create this table and then using the same client and same connection try to read the data from this table using any simple select * from....
check to see if it works.


-=Yuval=-
mmm... as a result of the differnt encodings, all hebrew sent via html forms in IE are sent here as a pile of gibrish....
so, in order to check it out, after creating the table, insert some data to the varchar column, using the same client from which the table was created and then try to read the data.

-=Yuval=-
Avatar of derrida

ASKER

hi Yuval and hi fibo
i thank you 2 for the efforts. i have tried the table as you formed it and it still gives me question marks (in IE and firefox).
i have played with all the reasonable encodings in both browsers and in all of them i cannot get hebrew from the database but i can see hebrew on regular sites and if i write hebrew manually into the page.
this is bizzare.
i hope you still have some idea cause i`m getting emptier:)

best regards

ron
the real question is what do you see in the phpmyadmin that you are using?
do you see there question marks instead of the hebrew letters or is it that only the browser gives the question marks?

try
select 'SomeHebrewLetters' as SomeHebrewLetters
replace the 'SomeHebrewLetters with some real hebrew.
what do you get?
do you get question marks in the column name and in the reslut itself or only in the results?

-=Yuval=-
Avatar of derrida

ASKER

hi
well no. i get hebrew in MPA. and i get hebrew in the php page if i insert it manually.
so on both ends i get hebrew.
i do not get it when i try to extract the hebrew from the database.

again thabks

ron
Ron, run the following query from your php script (not the myphpadmin), and post the reslults here.

show variables like '%%character%%'

There could be something there that needs to be set up as UTF8 and then mybe the set names....


-=Yuval=-
Avatar of derrida

ASKER

hi yuval
i do not want to sound dumb but what query exactly? i am not that experienced in working with databases, am i missing here some query?

thanks in advance

ron
Avatar of derrida

ASKER

hi yuval
i do not know how to upload stuff here. so i made a readdy small DB but exactly with the same structure of my main DB. here it is:

-- phpMyAdmin SQL Dump
-- version 2.9.2
-- http://www.phpmyadmin.net
--

-- ÞÐÒè àêÕàÙÝ: `names`
--
CREATE DATABASE `names` DEFAULT CHARACTER SET utf8 COLLATE utf8_bin;
USE `names`;

-- --------------------------------------------------------

--
-- ÞÑàÔ ØÑÜÔ âÑÕè ØÑÜÔ `mynames`
--

CREATE TABLE `mynames` (
  `id` int(11) NOT NULL auto_increment,
  `fname` varchar(30) collate utf8_bin NOT NULL,
  `lname` varchar(30) collate utf8_bin NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=3 ;

--
-- ÔÕæÐê ÞÙÓâ âÑÕè ØÑÜÔ `mynames`
--

INSERT INTO `mynames` (`id`, `fname`, `lname`) VALUES
(1, 0xd7a8d795d79f, 0xd7a9d7a4d7a8),
(2, 0xd79cd790d794, 0xd79bd7a8d79ed799d7aa);


i hope this will hel you to help me:)

ron
the query:

show variables like '%%character%%'

this should output the sets of character used for the connection between the DB and the application (php page, myphpadmin etc.)


your create table seems fine, and runingh it on my client.... went jest as expected. i've inserted some hebrew as fname and lname and it all came just fine....

try the show variables query in your application and post here the results.



-=Yuval=-
Avatar of derrida

ASKER

hi again yuval

first let me say thanks for the great support and paciance.

i ran this code :

<?php
include("includes/conn.php");
$result=mysql_query("SELECT * FROM mynames");
echo "show variables like '%%character%%'";

while($row=mysql_fetch_array($result)){
echo $row['fname'];
echo "<br/>";
}
?>

and the output was:

ÿÿshow variables like '%%character%%'???
???

i gave the php code cause i have the feeling i did not understood you correctly.

best regards

ron
ok, what you need to do is run the query in the php code:


<?php
include("includes/conn.php");
$result=mysql_query("show variables like '%%character%%'");

while($row=mysql_fetch_array($result)){
echo $row['fname'];
echo "<br/>";
}
?>

and post the result.
i would expect something similar to the following results:

Variable_name      Value
character_set_client      hebrew
character_set_connection      hebrew
character_set_database      latin1
character_set_filesystem      binary
character_set_results      hebrew
character_set_server      utf8
character_set_system      utf8
character_sets_dir      C:\\MySQL\\share\\charsets\\

these are the setting i use and it usually works fine.
I suspect something is different with your settings.

what say you?

-=Yuval=-
Avatar of derrida

ASKER

hi yuval

well i get no output at all. no first name and no settings, not even a question mark. so i cannot post anything.

ron
So using PMA you can input and display characters correctly.
THIS IS GREAT, 90% of the problem is solved!

Now, just to be sure, use a php script to output the characters input with PMA.
a - be sure that your php script outputs UTF-8 and defines that in the html header (doubly check that this header is the first caharcter of the output.
b - read one of your encoded fields, and display it in the 5 variations I suggested.

Do you get anything OK?
well, you should try fibo's suggestion and after that, if not working...
lets take your php script and modify it a bit to have some results:
<?php
include("includes/conn.php");
$result=mysql_query("show variables like '%%character%%'");

while($row=mysql_fetch_array($result)){
echo 'Name='.$row['Variable_name'].' Value='.$row['Value'];
echo "<br/>";
}
?>


and now?

Avatar of derrida

ASKER

hi yuval and fibo (welcome back:)
with the modified php script i do get this output:

ÿÿName=character_set_client Value=latin1
Name=character_set_connection Value=latin1
Name=character_set_database Value=utf8
Name=character_set_filesystem Value=binary
Name=character_set_results Value=latin1
Name=character_set_server Value=latin1
Name=character_set_system Value=utf8
Name=character_sets_dir Value=C:\program_files\wamp\mysql\share\charsets\

now i need to learn what does it tells me. i`m sure you can see it fast, but i need to read it again:)

i really appriciate your help.

ron
ok now, lets try to move one step ahead.

<?php
include("includes/conn.php");
$result=mysql_query("set names 'hebrew'");
$result=mysql_query("SELECT * FROM mynames");

while($row=mysql_fetch_array($result)){
echo $row['fname'];
echo "<br/>";
}
?>


now what do you get?
Try the following, assuming your data is in my_field in MySQL
<?php
include("includes/conn.php");
$field_name='my_field';
$result=mysql_query("select $field_name from my_table'");
while($row=mysql_fetch_array($result)){
$value=$row[$field_name];
echo "Using Decode [".utf8_decode($value)."] -- [".mb_convert_encoding($value, 'ISO-8859-8'). "]<br />";
echo "Normal [$value]<br>";
echo "Using Encode [".utf8_encode($value)."] -- [".mb_converte_encoding($value, 'UTF-8', 'ISO-8859-8')."]<br />";
echo "--------<br />";
}
?>

Further links/ readings:
- php multibyte functions
    http://www.php.net/manual/en/ref.mbstring.php
    http://www.php.net/manual/en/function.mb-convert-encoding.php
- hebrew characters on the web
    http://www.nirdagan.com/hebrew/standards

Avatar of derrida

ASKER

hi yuval and fibo

and i try yuval i get now boxes instead of question marks.
when i try fibo i get:
ÿÿUsing Decode [?] -- [???]
Normal [ýýý]

ron
Avatar of derrida

ASKER

what you see as yyy it is boxes.

ron
boxes.... mmmm....
while seeing the boxes, have you tried to change the IE/FF encoding?

-=yuval=-
Well, question marks as well as boxes are characters that do not display correctly.

You said that using PMA you could input and display correctly characters.
This gives us one combination of encoding that works, a "reference scenario".

Next objective now is to display correctly with a php script the characters that were correctly input with PMA. We are now sure it is possible, since PMA is pure php!
When you get a correct PMA display, which is the selected character code?
While we are here, are ALL the characters displayed on such a page OK, or is it just the data, while the other texts have problems?
Avatar of derrida

ASKER

hi
yuval: i have tried messing with the encoding of the browsers but it does not change anything.

fibo: when you write: "which is the selected character code" do you mean what encoding PMA say it expect? if so then it expect utf8_bin.
and as i said other text (not loaded from the database) are working fine.

i really do not understand anything that is going on. does it mean that i have to change some setting in php?
yuval you said that you expect some outputs and i got different ones. does it mean that i have to make some changes to the php on my machine?

best regards

ron
the query
$result=mysql_query("set names 'hebrew'");
in the php code should set the setting correctly.
in order to check it use the following code:

<?php
include("includes/conn.php");
$result=mysql_query("set names 'hebrew'");
$result=mysql_query("show variables like '%%character%%'");

while($row=mysql_fetch_array($result)){
echo 'Name='.$row['Variable_name'].' Value='.$row['Value'];
echo "<br/>";
}
?>

this should result in similar settings.

-=Yuval=-
Avatar of derrida

ASKER

hi yuval
stange things happen in the holly land:)

when i run the code as is, i still get the boxes. but now when i play with encodings in FF and i set to "windows-1255" and to "ISO-8859-1" i get the names in hebrew.
in IE7 it does not let me to change the encoding from utf-8.
ron
Avatar of derrida

ASKER

hi yuval and fibo

i have made an experiment with strange results:
i have made a small DB in latin1_swidish_ci. then i made a php file that insert and a file to extract. now all hebrew appears ok but in PMA it is gibrish.

i thought that once all is utf-8 i`ll have no problems.

have any idea?

ron
1 -  "windows-1255" is a hebrew code page
you say that it works with  "ISO-8859-1", does it work too with "ISO-8859-8"?
2 - My question about PMA is not about the code defined for mysql field, but about the code you selected for your interaction with PMA, and whicjh it uses for its browser.
3 - You say that IE does not allow you to change page code. Did you ask it first to disable 'auto'?
Avatar of derrida

ASKER

hi
it does not work with ISO-8859-8, it comes out upside up.
regarding PMA. if you mean the connection encoding then it is utf8_unicode_ci.
and IE&:yes i deselected the automatic ecncoding. but i cannot change the encoding.
this are bizzare.

ron
ron, can you export the table (structure and  data) so that we could try to see what is going on?
is there any web page to enter to check it out?

-=Yuval=-
Avatar of derrida

ASKER

hi yuval
here is the experiment i did, the table and the data as it appears in PMA:
-- phpMyAdmin SQL Dump
-- version 2.9.2
-- http://www.phpmyadmin.net
--
-- ÞÐè×: localhost
-- ÖÞß ÙÙæÕè: ÐÕÒÕáØ 05, 2007 at 05:26 PM
-- Òèáê éèê: 5.0.27
-- Òèáê PHP: 5.2.1
--
-- ÞÐÒè àêÕàÙÝ: `hebrew`
--

-- --------------------------------------------------------

--
-- ÞÑàÔ ØÑÜÔ âÑÕè ØÑÜÔ `family-names`
--

CREATE TABLE `family-names` (
  `id` int(11) NOT NULL auto_increment,
  `fname` varchar(30) NOT NULL,
  `lname` varchar(30) NOT NULL,
  `username` varchar(30) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;

--
-- ÔÕæÐê ÞÙÓâ âÑÕè ØÑÜÔ `family-names`
--

INSERT INTO `family-names` (`id`, `fname`, `lname`, `username`) VALUES
(8, 'א×ם', '××x-××"ר', 'א××~סק×"'),
(5, '×¢× ×ª', '××x-××"ר', '×¢× ×ª×"'),
(4, '×¢× ×ר', '××x-××"ר', '×¢× ×ר×"ש');

as you can see: the hebrew part is with wird symbols. but on the page it shows as hebrew.the hebrew is in the INSERT part.

best regards

ron
ASKER CERTIFIED SOLUTION
Avatar of Yuval_Shohat
Yuval_Shohat
Flag of Israel image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of derrida

ASKER

hi yuval
well it worked, so this is a huge thanks from me.
but ,one last question: do i have to do it each time i work with hebrew?

best regards

ron
the set name query, once for every connection (in a php script all you have to do is run the query right after the open connection command).
the create table using utf8 well, if you use the default apparently you get the latin1 as default. check the options and the parameters of the client from which you are creating the table, im more than sure you will find some option about default table charset or something of this sort.

-=Yuval=-
Yuval, great, I bookmark this great solution to an etrenal problem!