Link to home
Start Free TrialLog in
Avatar of ariestav
ariestav

asked on

Unable to correctly write hebrew characters with PHP using utf8_encode. Please help.

I'm trying to do something very simple with PHP hosted on a linux box.  I need to be able to write hebrew characters to a text file.  That text file is read by an application for further processing.  I am able to manually enter hebrew text characters into a standard text-editor and have the application interpret those hebrew characters just fine.  However, when I use PHP to write out hebrew characters to the text file, the application shows strange / garbled characters.

Here's the setup:

In a script I have a variable that stores some hebrew characters.  I've attached a screenshot from Coda so you can see how that line of code looks in my code editor.  The variable $hebrew_child is set to store some hebrew characters.  I use that variable to create another string to be stored in another variable.  So something like:

$mylongstring = 'some very long intro text ' . $hebrew_child . ' some very long exit text';

Open in new window


So that code exists in one script.  In another, separate script, I include that script, and open a file, then write to that file (and overwrite if it exists).  I am writing the data stored in the $mylongstring variable.  So it's something like:

if (file_exists($server_lut)) 
{

	//open the lut file under the proper ftp context
	$lut_file = fopen($server_lut, "wb", false, $ftp_context);
	flush_echo(nl2br("\nOpened existing LUT file for writing.")); 
	
} else { 
	
	$lut_file = fopen($server_lut, "wb"); 
	flush_echo(nl2br("\nOpened new LUT file to write to."));
	
}

//Include the long string that has the hebrew characters in a long string
include_once($_POST['LUT']);

if ($lut_file)
{

	//add header to text file for UTF-8 encoding
	$utf8_lut_file = utf8_encode($mylongstring);
	$i_lut_file = "\xEF\xBB\xBF".$utf8_lut_file;
	
	//Write out the lut file based on the provided input data from the user
	fwrite($lut_file, $i_lut_file);
	
	flush_echo(nl2br("\nWrote new data to LUT file."));	
	
} else {

	flush_echo(nl2br("\nDid not open the LUT file."));
	
}

//Close the file after writing to it
fclose($lut_file);

Open in new window


So, I would expect, opening the output text file on a Windows or OSX machine that I would see hebrew characters in there, but I see some strange stuff.  On windows I see this:

׿׿צ׿ ר×

which is how the application that reads this text file interprets it.  When I open the file with vim on linux, I get this:

?~T?~U¿?~^ ¿?~Q

I'm not even sure those string will show up.  I should note that if I echo out the $mylongstring variable, into the <body> element of an HTML document using the charset="UTF-8" meta tag, browsers do in fact display the hebrew text correctly.  I am just not sure what is going on with the text file.

Anyways, I found some threads on here, particularly this one about php, utf8, and hebrew:

https://www.experts-exchange.com/questions/23859387/using-fopen-with-hebrew-file-name-inside-a-UTF-8-encoded-file-problem.html

I've done some googling, and I thought iconv might do the trick, but it didn't.  I don't know what to do so that hebrew text is written correctly to the text file.

Any help is greatly appreciated.

Thanks!

 User generated image
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

I don't really have an answer, but maybe Joel does:
http://www.joelonsoftware.com/articles/Unicode.html
Avatar of ariestav
ariestav

ASKER

I found that link a while ago, and while I appeciate you pointing me to that link, it does nothing to solve my particular problem.  He is writing is broad terms in that article, and I am focused on a specific problem.  Do I need to go to Joel's Stack Overflow site to get more specific help and troubleshooting advice?
It can't hurt to ask on StackOverflow.  But I am curious about something.  Maybe the characters are getting written OK, but are being interpreted wrong when viewed with Vim or whatever viewer on Windows.

Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

A similar test would include writing the file (it is a string of data), then reading it back and comparing the returned data to the original string.
Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

I've tried this, and mentioned that in my post already.  A browser will, in fact, show the hebrew text correctly with the meta tag in place.  When the meta tag charset value is not set, then it appears as question marks.

The problem is that it doesn't work as a text file where there is no way to place a meta tag.  I have opened the file not only in the app that ends up reading the file, but also notepage and vim.  Both show jarbled text.
Can you please post a hebrew string here for me to experiment?  Thanks.  Maybe just something like "Merry Christmas"  ;-)
¿¿¿¿ ¿¿

Does that work?

or how about:

¿¿¿¿ ¿¿

Open in new window

I guess EE doesn't handle UTF-8 input.

I am attaching a text file made with TextEdit on OSX.  The file was saved with the UTF-8 option in the Save dialog.  


hebrewstring.txt
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Interestingly the inverted question marks that you see in the post above are not what I saw when I copied and pasted the browser output of my little test script.  But I believe that the content of the file is intact.  And with the right display methods, it can be visualized.
Yes, as I mentioned in earlier posts.  I can achieve the output in a browser using the meta tag, but not in a text file.

Not sure how far your code will take me to figuring out a solution.

Thanks,!
I guess I am left to wonder what you mean by a "text file."  The file my test script reads is the file you posted here, and it appears to be a text file.

When I use the wrong encoding, like ISO-8859-1, I get something like this through the browser:
הוצמ תב

However the file on my server remains intact .  And when I use the correct encoding I can display it through the browser.  I can read it and write it back without error.  So my conclusion is that PHP can read and write the data string correctly, and the only part of this process that is not working is the method of display that will render the Hebrew characters correctly.
<?php // RAY_temp_ariestav.php
error_reporting(E_ALL);

$top = <<<TOP
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
TOP;

$end = <<<END
</body>
</html>
END;

// READ THE FILE INTO PHP AND WRITE IT BACK ONTO THE SERVER
$hebrew = file_get_contents('http://filedb.experts-exchange.com/incoming/2011/09_w38/500492/hebrewstring.txt');
file_put_contents('RAY_temp_ariestav.txt', $hebrew);

// CREATE THE WEB PAGE
echo $top;
echo file_get_contents('RAY_temp_ariestav.txt');
echo $end;

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Since I am able to read and write the Hebrew text without any troubles, maybe you would want to compare your mbstring settings to mine. User generated imageSee also this page:
http://php.net/manual/en/language.types.string.php

Quote: This means that PHP only supports a 256-character set, and hence does not offer native Unicode support.

No wonder this is problematic!
Here is the mbstring, and the default_charset setting I changed as advised by that blog posting.  These settings give me the results I want and they work without a hitch.
Screen-shot-2011-09-16-at-10.34..png
Screen-shot-2011-09-16-at-10.34..png
The issue came down to the configuration of my PHP server, not necessarily the code itself.  Thank you, Ray, for your help, though.  It did help me troubleshoot the issue I was having.