Unable to correctly write hebrew characters with PHP using utf8_encode. Please help.

I'm trying to do something very simple with PHP hosted on a linux box.  I need to be able to write hebrew characters to a text file.  That text file is read by an application for further processing.  I am able to manually enter hebrew text characters into a standard text-editor and have the application interpret those hebrew characters just fine.  However, when I use PHP to write out hebrew characters to the text file, the application shows strange / garbled characters.

Here's the setup:

In a script I have a variable that stores some hebrew characters.  I've attached a screenshot from Coda so you can see how that line of code looks in my code editor.  The variable $hebrew_child is set to store some hebrew characters.  I use that variable to create another string to be stored in another variable.  So something like:

$mylongstring = 'some very long intro text ' . $hebrew_child . ' some very long exit text';

Open in new window


So that code exists in one script.  In another, separate script, I include that script, and open a file, then write to that file (and overwrite if it exists).  I am writing the data stored in the $mylongstring variable.  So it's something like:

if (file_exists($server_lut)) 
{

	//open the lut file under the proper ftp context
	$lut_file = fopen($server_lut, "wb", false, $ftp_context);
	flush_echo(nl2br("\nOpened existing LUT file for writing.")); 
	
} else { 
	
	$lut_file = fopen($server_lut, "wb"); 
	flush_echo(nl2br("\nOpened new LUT file to write to."));
	
}

//Include the long string that has the hebrew characters in a long string
include_once($_POST['LUT']);

if ($lut_file)
{

	//add header to text file for UTF-8 encoding
	$utf8_lut_file = utf8_encode($mylongstring);
	$i_lut_file = "\xEF\xBB\xBF".$utf8_lut_file;
	
	//Write out the lut file based on the provided input data from the user
	fwrite($lut_file, $i_lut_file);
	
	flush_echo(nl2br("\nWrote new data to LUT file."));	
	
} else {

	flush_echo(nl2br("\nDid not open the LUT file."));
	
}

//Close the file after writing to it
fclose($lut_file);

Open in new window


So, I would expect, opening the output text file on a Windows or OSX machine that I would see hebrew characters in there, but I see some strange stuff.  On windows I see this:

׿׿צ׿ ר×

which is how the application that reads this text file interprets it.  When I open the file with vim on linux, I get this:

?~T?~U¿?~^ ¿?~Q

I'm not even sure those string will show up.  I should note that if I echo out the $mylongstring variable, into the <body> element of an HTML document using the charset="UTF-8" meta tag, browsers do in fact display the hebrew text correctly.  I am just not sure what is going on with the text file.

Anyways, I found some threads on here, particularly this one about php, utf8, and hebrew:

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/PHP_Windows/Q_23859387.html

I've done some googling, and I thought iconv might do the trick, but it didn't.  I don't know what to do so that hebrew text is written correctly to the text file.

Any help is greatly appreciated.

Thanks!

 This is how my variable is set in my code editor Coda
LVL 4
ariestavAsked:
Who is Participating?
 
Ray PaseurConnect With a Mentor Commented:
http://www.laprbass.com/RAY_temp_ariestav.php
Outputs something like:
¿¿¿¿ ¿¿
<?php // RAY_temp_ariestav.php
error_reporting(E_ALL);

$top = <<<TOP
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
TOP;

$end = <<<END
</body>
</html>
END;

echo $top;
echo file_get_contents('http://filedb.experts-exchange.com/incoming/2011/09_w38/500492/hebrewstring.txt');
echo $end;

Open in new window

0
 
Ray PaseurCommented:
I don't really have an answer, but maybe Joel does:
http://www.joelonsoftware.com/articles/Unicode.html
0
 
ariestavAuthor Commented:
I found that link a while ago, and while I appeciate you pointing me to that link, it does nothing to solve my particular problem.  He is writing is broad terms in that article, and I am focused on a specific problem.  Do I need to go to Joel's Stack Overflow site to get more specific help and troubleshooting advice?
0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

 
Ray PaseurCommented:
It can't hurt to ask on StackOverflow.  But I am curious about something.  Maybe the characters are getting written OK, but are being interpreted wrong when viewed with Vim or whatever viewer on Windows.

Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

A similar test would include writing the file (it is a string of data), then reading it back and comparing the returned data to the original string.
0
 
ariestavAuthor Commented:
Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

I've tried this, and mentioned that in my post already.  A browser will, in fact, show the hebrew text correctly with the meta tag in place.  When the meta tag charset value is not set, then it appears as question marks.

The problem is that it doesn't work as a text file where there is no way to place a meta tag.  I have opened the file not only in the app that ends up reading the file, but also notepage and vim.  Both show jarbled text.
0
 
Ray PaseurCommented:
Can you please post a hebrew string here for me to experiment?  Thanks.  Maybe just something like "Merry Christmas"  ;-)
0
 
ariestavAuthor Commented:
¿¿¿¿ ¿¿

Does that work?

or how about:

¿¿¿¿ ¿¿

Open in new window

0
 
ariestavAuthor Commented:
I guess EE doesn't handle UTF-8 input.

I am attaching a text file made with TextEdit on OSX.  The file was saved with the UTF-8 option in the Save dialog.  


hebrewstring.txt
0
 
Ray PaseurCommented:
Interestingly the inverted question marks that you see in the post above are not what I saw when I copied and pasted the browser output of my little test script.  But I believe that the content of the file is intact.  And with the right display methods, it can be visualized.
0
 
ariestavAuthor Commented:
Yes, as I mentioned in earlier posts.  I can achieve the output in a browser using the meta tag, but not in a text file.

Not sure how far your code will take me to figuring out a solution.

Thanks,!
0
 
Ray PaseurCommented:
I guess I am left to wonder what you mean by a "text file."  The file my test script reads is the file you posted here, and it appears to be a text file.

When I use the wrong encoding, like ISO-8859-1, I get something like this through the browser:
הוצמ תב

However the file on my server remains intact .  And when I use the correct encoding I can display it through the browser.  I can read it and write it back without error.  So my conclusion is that PHP can read and write the data string correctly, and the only part of this process that is not working is the method of display that will render the Hebrew characters correctly.
<?php // RAY_temp_ariestav.php
error_reporting(E_ALL);

$top = <<<TOP
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
TOP;

$end = <<<END
</body>
</html>
END;

// READ THE FILE INTO PHP AND WRITE IT BACK ONTO THE SERVER
$hebrew = file_get_contents('http://filedb.experts-exchange.com/incoming/2011/09_w38/500492/hebrewstring.txt');
file_put_contents('RAY_temp_ariestav.txt', $hebrew);

// CREATE THE WEB PAGE
echo $top;
echo file_get_contents('RAY_temp_ariestav.txt');
echo $end;

Open in new window

0
 
ariestavConnect With a Mentor Author Commented:
By text file, I mean that when I open up the text file the PHP writes to in an ordinary app like Notepad or TextEdit, the characters are still jarbled.  I figured out the problem though, and it is related to how PHP is configured an setup.  I found this blog entry which helped me configure PHP on my server to process string with muli-byte character encodings (unicode).  

http://developer.loftdigital.com/blog/php-utf-8-cheatsheet

The author even references Joel Spolki's blog entry, but mentions that it does not give real-world examples of how to implement it.  

Thanks, Ray for your time and help.  Not sure how I can divey out the points. . .
0
 
Ray PaseurCommented:
Since I am able to read and write the Hebrew text without any troubles, maybe you would want to compare your mbstring settings to mine. phpinfo() for mbstring on LAPRBass.comSee also this page:
http://php.net/manual/en/language.types.string.php

Quote: This means that PHP only supports a 256-character set, and hence does not offer native Unicode support.

No wonder this is problematic!
0
 
ariestavAuthor Commented:
Here is the mbstring, and the default_charset setting I changed as advised by that blog posting.  These settings give me the results I want and they work without a hitch.
Screen-shot-2011-09-16-at-10.34..png
Screen-shot-2011-09-16-at-10.34..png
0
 
ariestavAuthor Commented:
The issue came down to the configuration of my PHP server, not necessarily the code itself.  Thank you, Ray, for your help, though.  It did help me troubleshoot the issue I was having.
0
 
Ray PaseurCommented:
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.