Solved

Unable to correctly write hebrew characters with PHP using utf8_encode.  Please help.

Posted on 2011-09-15
16
966 Views
Last Modified: 2016-06-16
I'm trying to do something very simple with PHP hosted on a linux box.  I need to be able to write hebrew characters to a text file.  That text file is read by an application for further processing.  I am able to manually enter hebrew text characters into a standard text-editor and have the application interpret those hebrew characters just fine.  However, when I use PHP to write out hebrew characters to the text file, the application shows strange / garbled characters.

Here's the setup:

In a script I have a variable that stores some hebrew characters.  I've attached a screenshot from Coda so you can see how that line of code looks in my code editor.  The variable $hebrew_child is set to store some hebrew characters.  I use that variable to create another string to be stored in another variable.  So something like:

$mylongstring = 'some very long intro text ' . $hebrew_child . ' some very long exit text';

Open in new window


So that code exists in one script.  In another, separate script, I include that script, and open a file, then write to that file (and overwrite if it exists).  I am writing the data stored in the $mylongstring variable.  So it's something like:

if (file_exists($server_lut)) 
{

	//open the lut file under the proper ftp context
	$lut_file = fopen($server_lut, "wb", false, $ftp_context);
	flush_echo(nl2br("\nOpened existing LUT file for writing.")); 
	
} else { 
	
	$lut_file = fopen($server_lut, "wb"); 
	flush_echo(nl2br("\nOpened new LUT file to write to."));
	
}

//Include the long string that has the hebrew characters in a long string
include_once($_POST['LUT']);

if ($lut_file)
{

	//add header to text file for UTF-8 encoding
	$utf8_lut_file = utf8_encode($mylongstring);
	$i_lut_file = "\xEF\xBB\xBF".$utf8_lut_file;
	
	//Write out the lut file based on the provided input data from the user
	fwrite($lut_file, $i_lut_file);
	
	flush_echo(nl2br("\nWrote new data to LUT file."));	
	
} else {

	flush_echo(nl2br("\nDid not open the LUT file."));
	
}

//Close the file after writing to it
fclose($lut_file);

Open in new window


So, I would expect, opening the output text file on a Windows or OSX machine that I would see hebrew characters in there, but I see some strange stuff.  On windows I see this:

׿׿צ׿ ר×

which is how the application that reads this text file interprets it.  When I open the file with vim on linux, I get this:

?~T?~U¿?~^ ¿?~Q

I'm not even sure those string will show up.  I should note that if I echo out the $mylongstring variable, into the <body> element of an HTML document using the charset="UTF-8" meta tag, browsers do in fact display the hebrew text correctly.  I am just not sure what is going on with the text file.

Anyways, I found some threads on here, particularly this one about php, utf8, and hebrew:

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/PHP_Windows/Q_23859387.html

I've done some googling, and I thought iconv might do the trick, but it didn't.  I don't know what to do so that hebrew text is written correctly to the text file.

Any help is greatly appreciated.

Thanks!

 This is how my variable is set in my code editor Coda
0
Comment
Question by:ariestav
  • 8
  • 8
16 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 36545651
I don't really have an answer, but maybe Joel does:
http://www.joelonsoftware.com/articles/Unicode.html
0
 
LVL 4

Author Comment

by:ariestav
ID: 36545943
I found that link a while ago, and while I appeciate you pointing me to that link, it does nothing to solve my particular problem.  He is writing is broad terms in that article, and I am focused on a specific problem.  Do I need to go to Joel's Stack Overflow site to get more specific help and troubleshooting advice?
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 36546045
It can't hurt to ask on StackOverflow.  But I am curious about something.  Maybe the characters are getting written OK, but are being interpreted wrong when viewed with Vim or whatever viewer on Windows.

Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

A similar test would include writing the file (it is a string of data), then reading it back and comparing the returned data to the original string.
0
 
LVL 4

Author Comment

by:ariestav
ID: 36546111
Here is what I might try... Write the file with Hebrew characters, then close it and read it back with file_get_contents() and echo the file to the browser.  You might need to have it inside a WWW page with the UTF-8 meta tag.

I've tried this, and mentioned that in my post already.  A browser will, in fact, show the hebrew text correctly with the meta tag in place.  When the meta tag charset value is not set, then it appears as question marks.

The problem is that it doesn't work as a text file where there is no way to place a meta tag.  I have opened the file not only in the app that ends up reading the file, but also notepage and vim.  Both show jarbled text.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 36546135
Can you please post a hebrew string here for me to experiment?  Thanks.  Maybe just something like "Merry Christmas"  ;-)
0
 
LVL 4

Author Comment

by:ariestav
ID: 36546171
¿¿¿¿ ¿¿

Does that work?

or how about:

¿¿¿¿ ¿¿

Open in new window

0
 
LVL 4

Author Comment

by:ariestav
ID: 36546185
I guess EE doesn't handle UTF-8 input.

I am attaching a text file made with TextEdit on OSX.  The file was saved with the UTF-8 option in the Save dialog.  


hebrewstring.txt
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 36546236
http://www.laprbass.com/RAY_temp_ariestav.php
Outputs something like:
¿¿¿¿ ¿¿
<?php // RAY_temp_ariestav.php
error_reporting(E_ALL);

$top = <<<TOP
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
TOP;

$end = <<<END
</body>
</html>
END;

echo $top;
echo file_get_contents('http://filedb.experts-exchange.com/incoming/2011/09_w38/500492/hebrewstring.txt');
echo $end;

Open in new window

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 108

Expert Comment

by:Ray Paseur
ID: 36546243
Interestingly the inverted question marks that you see in the post above are not what I saw when I copied and pasted the browser output of my little test script.  But I believe that the content of the file is intact.  And with the right display methods, it can be visualized.
0
 
LVL 4

Author Comment

by:ariestav
ID: 36546249
Yes, as I mentioned in earlier posts.  I can achieve the output in a browser using the meta tag, but not in a text file.

Not sure how far your code will take me to figuring out a solution.

Thanks,!
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 36546374
I guess I am left to wonder what you mean by a "text file."  The file my test script reads is the file you posted here, and it appears to be a text file.

When I use the wrong encoding, like ISO-8859-1, I get something like this through the browser:
הוצמ תב

However the file on my server remains intact .  And when I use the correct encoding I can display it through the browser.  I can read it and write it back without error.  So my conclusion is that PHP can read and write the data string correctly, and the only part of this process that is not working is the method of display that will render the Hebrew characters correctly.
<?php // RAY_temp_ariestav.php
error_reporting(E_ALL);

$top = <<<TOP
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
TOP;

$end = <<<END
</body>
</html>
END;

// READ THE FILE INTO PHP AND WRITE IT BACK ONTO THE SERVER
$hebrew = file_get_contents('http://filedb.experts-exchange.com/incoming/2011/09_w38/500492/hebrewstring.txt');
file_put_contents('RAY_temp_ariestav.txt', $hebrew);

// CREATE THE WEB PAGE
echo $top;
echo file_get_contents('RAY_temp_ariestav.txt');
echo $end;

Open in new window

0
 
LVL 4

Assisted Solution

by:ariestav
ariestav earned 0 total points
ID: 36549802
By text file, I mean that when I open up the text file the PHP writes to in an ordinary app like Notepad or TextEdit, the characters are still jarbled.  I figured out the problem though, and it is related to how PHP is configured an setup.  I found this blog entry which helped me configure PHP on my server to process string with muli-byte character encodings (unicode).  

http://developer.loftdigital.com/blog/php-utf-8-cheatsheet

The author even references Joel Spolki's blog entry, but mentions that it does not give real-world examples of how to implement it.  

Thanks, Ray for your time and help.  Not sure how I can divey out the points. . .
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 36549914
Since I am able to read and write the Hebrew text without any troubles, maybe you would want to compare your mbstring settings to mine. phpinfo() for mbstring on LAPRBass.comSee also this page:
http://php.net/manual/en/language.types.string.php

Quote: This means that PHP only supports a 256-character set, and hence does not offer native Unicode support.

No wonder this is problematic!
0
 
LVL 4

Author Comment

by:ariestav
ID: 36550011
Here is the mbstring, and the default_charset setting I changed as advised by that blog posting.  These settings give me the results I want and they work without a hitch.
Screen-shot-2011-09-16-at-10.34..png
Screen-shot-2011-09-16-at-10.34..png
0
 
LVL 4

Author Closing Comment

by:ariestav
ID: 36594682
The issue came down to the configuration of my PHP server, not necessarily the code itself.  Thank you, Ray, for your help, though.  It did help me troubleshoot the issue I was having.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 41657278
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Envision that you are chipping away at another e-business site with a team of pundit developers and designers. Everything seems, by all accounts, to be going easily.
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now