Solved

Unable to correctly write hebrew characters with PHP using utf8_encode.  Please help.

Posted on 2011-09-15
16
990 Views
Last Modified: 2016-06-16
I'm trying to do something very simple with PHP hosted on a linux box.  I need to be able to write hebrew characters to a text file.  That text file is read by an application for further processing.  I am able to manually enter hebrew text characters into a standard text-editor and have the application interpret those hebrew characters just fine.  However, when I use PHP to write out hebrew characters to the text file, the application shows strange / garbled characters.

Here's the setup:

In a script I have a variable that stores some hebrew characters.  I've attached a screenshot from Coda so you can see how that line of code looks in my code editor.  The variable $hebrew_child is set to store some hebrew characters.  I use that variable to create another string to be stored in another variable.  So something like:

$mylongstring = 'some very long intro text ' . $hebrew_child . ' some very long exit text';

Open in new window


So that code exists in one script.  In another, separate script, I include that script, and open a file, then write to that file (and overwrite if it exists).  I am writing the data stored in the $mylongstring variable.  So it's something like:

if (file_exists($server_lut)) 
{

	//open the lut file under the proper ftp context
	$lut_file = fopen($server_lut, "wb", false, $ftp_context);
	flush_echo(nl2br("\nOpened existing LUT file for writing.")); 
	
} else { 
	
	$lut_file = fopen($server_lut, "wb"); 
	flush_echo(nl2br("\nOpened new LUT file to write to."));
	
}

//Include the long string that has the hebrew characters in a long string
include_once($_POST['LUT']);

if ($lut_file)
{

	//add header to text file for UTF-8 encoding
	$utf8_lut_file = utf8_encode($mylongstring);
	$i_lut_file = "\xEF\xBB\xBF".$utf8_lut_file;
	
	//Write out the lut file based on the provided input data from the user
	fwrite($lut_file, $i_lut_file);
	
	flush_echo(nl2br("\nWrote new data to LUT file."));	
	
} else {

	flush_echo(nl2br("\nDid not open the LUT file."));
	
}

//Close the file after writing to it
fclose($lut_file);

Open in new window


So, I would expect, opening the output text file on a Windows or OSX machine that I would see hebrew characters in there, but I see some strange stuff.  On windows I see this:

׿׿צ׿ ר×

which is how the application that reads this text file interprets it.  When I open the file with vim on linux, I get this:

?~T?~U¿?~^ ¿?~Q

I'm not even sure those string will show up.  I should note that if I echo out the $mylongstring variable, into the <body> element of an HTML document using the charset="UTF-8" meta tag, browsers do in fact display the hebrew text correctly.  I am just not sure what is going on with the text file.

Anyways, I found some threads on here, particularly this one about php, utf8, and hebrew:

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/PHP_Windows/Q_23859387.html

I've done some googling, and I thought iconv might do the trick, but it didn't.  I don't know what to do so that hebrew text is written correctly to the text file.

Any help is greatly appreciated.

Thanks!

 This is how my variable is set in my code editor Coda
0
Comment
Question by:ariestav
  • 8
  • 8
16 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36545651
I don't really have an answer, but maybe Joel does:
http://www.joelonsoftware.com/articles/Unicode.html
0
 
LVL 4

Author Comment

by:ariestav
ID: 36545943
I found that link a while ago, and while I appeciate you pointing me to that link, it does nothing to solve my particular problem.  He is writing is broad terms in that article, and I am focused on a specific problem.  Do I need to go to Joel's Stack Overflow site to get more specific help and troubleshooting advice?
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36546045
It can't hurt to ask on StackOverflow.  But I am curious about something.  Maybe the characters are getting written OK, but are being interpreted wrong when viewed with Vim or whatever viewer on Windows.

Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

A similar test would include writing the file (it is a string of data), then reading it back and comparing the returned data to the original string.
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 4

Author Comment

by:ariestav
ID: 36546111
Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

I've tried this, and mentioned that in my post already.  A browser will, in fact, show the hebrew text correctly with the meta tag in place.  When the meta tag charset value is not set, then it appears as question marks.

The problem is that it doesn't work as a text file where there is no way to place a meta tag.  I have opened the file not only in the app that ends up reading the file, but also notepage and vim.  Both show jarbled text.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36546135
Can you please post a hebrew string here for me to experiment?  Thanks.  Maybe just something like "Merry Christmas"  ;-)
0
 
LVL 4

Author Comment

by:ariestav
ID: 36546171
¿¿¿¿ ¿¿

Does that work?

or how about:

¿¿¿¿ ¿¿

Open in new window

0
 
LVL 4

Author Comment

by:ariestav
ID: 36546185
I guess EE doesn't handle UTF-8 input.

I am attaching a text file made with TextEdit on OSX.  The file was saved with the UTF-8 option in the Save dialog.  


hebrewstring.txt
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 36546236
http://www.laprbass.com/RAY_temp_ariestav.php
Outputs something like:
¿¿¿¿ ¿¿
<?php // RAY_temp_ariestav.php
error_reporting(E_ALL);

$top = <<<TOP
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
TOP;

$end = <<<END
</body>
</html>
END;

echo $top;
echo file_get_contents('http://filedb.experts-exchange.com/incoming/2011/09_w38/500492/hebrewstring.txt');
echo $end;

Open in new window

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36546243
Interestingly the inverted question marks that you see in the post above are not what I saw when I copied and pasted the browser output of my little test script.  But I believe that the content of the file is intact.  And with the right display methods, it can be visualized.
0
 
LVL 4

Author Comment

by:ariestav
ID: 36546249
Yes, as I mentioned in earlier posts.  I can achieve the output in a browser using the meta tag, but not in a text file.

Not sure how far your code will take me to figuring out a solution.

Thanks,!
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36546374
I guess I am left to wonder what you mean by a "text file."  The file my test script reads is the file you posted here, and it appears to be a text file.

When I use the wrong encoding, like ISO-8859-1, I get something like this through the browser:
הוצמ תב

However the file on my server remains intact .  And when I use the correct encoding I can display it through the browser.  I can read it and write it back without error.  So my conclusion is that PHP can read and write the data string correctly, and the only part of this process that is not working is the method of display that will render the Hebrew characters correctly.
<?php // RAY_temp_ariestav.php
error_reporting(E_ALL);

$top = <<<TOP
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
TOP;

$end = <<<END
</body>
</html>
END;

// READ THE FILE INTO PHP AND WRITE IT BACK ONTO THE SERVER
$hebrew = file_get_contents('http://filedb.experts-exchange.com/incoming/2011/09_w38/500492/hebrewstring.txt');
file_put_contents('RAY_temp_ariestav.txt', $hebrew);

// CREATE THE WEB PAGE
echo $top;
echo file_get_contents('RAY_temp_ariestav.txt');
echo $end;

Open in new window

0
 
LVL 4

Assisted Solution

by:ariestav
ariestav earned 0 total points
ID: 36549802
By text file, I mean that when I open up the text file the PHP writes to in an ordinary app like Notepad or TextEdit, the characters are still jarbled.  I figured out the problem though, and it is related to how PHP is configured an setup.  I found this blog entry which helped me configure PHP on my server to process string with muli-byte character encodings (unicode).  

http://developer.loftdigital.com/blog/php-utf-8-cheatsheet

The author even references Joel Spolki's blog entry, but mentions that it does not give real-world examples of how to implement it.  

Thanks, Ray for your time and help.  Not sure how I can divey out the points. . .
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36549914
Since I am able to read and write the Hebrew text without any troubles, maybe you would want to compare your mbstring settings to mine. phpinfo() for mbstring on LAPRBass.comSee also this page:
http://php.net/manual/en/language.types.string.php

Quote: This means that PHP only supports a 256-character set, and hence does not offer native Unicode support.

No wonder this is problematic!
0
 
LVL 4

Author Comment

by:ariestav
ID: 36550011
Here is the mbstring, and the default_charset setting I changed as advised by that blog posting.  These settings give me the results I want and they work without a hitch.
Screen-shot-2011-09-16-at-10.34..png
Screen-shot-2011-09-16-at-10.34..png
0
 
LVL 4

Author Closing Comment

by:ariestav
ID: 36594682
The issue came down to the configuration of my PHP server, not necessarily the code itself.  Thank you, Ray, for your help, though.  It did help me troubleshoot the issue I was having.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 41657278
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question