Including UTF-8 content in UTF-8 output

I have some UTF-8 text that I want to include in my PHP. The PHP itself outputs UTF-8, so I thought it should be a no brainer. The PHP has <?php header('Content-Type: text/html; charset=UTF-8'); ?> and <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />.

I was expecting to be able to include the UTF-8 text simply by using include(), but I find that I have to convert it to ISO-8559-1 for it to get converted back again to UTF-8 - see the code snippet. This seems silly. Is there a portable way to make the internal encoding UTF-8 rather than ISO-8559-1 to avoid the to-and-fro conversion?


<?php
if (is_file('frag/movie/review/'.$id_path.'.txt'))
	echo(utf8_decode(file_get_contents('frag/movie/review/'.$id_path.'.txt')));
	//include('frag/movie/review/'.$id_path.'.txt');
else
	echo("<p>No movie review available</p>");
?>

Open in new window

LVL 17
rstaveleyAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

shadow_shooterCommented:
You can simply change the character set of the content to utf8 by special software. I recommend you to use notepad++ but you can google it and there should be a lot of documents explaining how to do it.

If you couldn't do it, let me know.
0
rstaveleyAuthor Commented:
You didn't understand my question.

Here is the background:

  1. The text file is valid UTF-8 text (with no BOM). This has been verified.
  2. My PHP outputs UTF-8, using Content-Type: text/html; charset=UTF-8.
  3. I expected to be able to use include() to include the UTF-8 file, but I was wrong.
  4. If I convert my included UTF-8 to ISO-8559-1, using utf8_decode(), it works.
Here is my problem:

  • It seems inefficient for the UTF-8 text to be converted by the PHP script to ISO-8559-1 so that PHP can convert it back again to UTF-8. This must be making it slow and it must mean that it can only handle characters which can be converted to ISO-8559-1.
Here is my question:

  • How do I make my PHP work internally in UTF-8 rather than ISO-8559-1?
0
rstaveleyAuthor Commented:
I was wrong about this.

> The text file is valid UTF-8 text (with no BOM). This has been verified.

This is what was going on: http:Q_23900962.html. My verification was plain wrong. I have verified that it really is UTF-8 that I'm generating now and it really is.

I now find that PHP "does the right thing" with include(). Like a server-side include, it assumes that the included content has the character set specified by http://httpd.apache.org/docs/2.0/mod/mod_mime.html#addcharset

This completely makes sense now.

On that basis, it makes sense in my application either to AddCharSet a special extension for UTF-8 text or to echo(file_get_contents($filename)), which is what I've wound up doing.

I expect that this is more efficient than using include() anyhow.
<?php
if (is_file('frag/movie/review/'.$id_path.'.txt'))
 
	# Naive include would only work if the .txt was ISO-8559-1
	# or if .txt was in an AddCharSet for UTF-8 in Apache's
	# directives. The include is high level, going via Apache
	# and the Content-Type reported by Apache is respected and
	# it is converted from that Content-Type.
	#include('frag/movie/review/'.$id_path.'.txt');
 
	# Bad publishes from MySQL "doubly UTF-8'ed" the data and the following
	# bodge designed to convert UTF-8 to ISO-8559-1 was needed to 
	# get UTF-8.
	#echo(utf8_decode(file_get_contents('frag/movie/review/'.$id_path.'.txt')));
 
	# This is the right way to put raw UTF-8 data into the output
	# buffer. Out .txt file goes through no conversion.
	echo(file_get_contents('frag/movie/review/'.$id_path.'.txt'));
else
	echo("<p>No movie review available</p>");
?>

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DamienRocheCommented:
It is far more efficient to store these reviews in a database and edit them via an admin panel, than using an include as oppose to your current method which is also extremely bad practice on a security level.
0
rstaveleyAuthor Commented:
Thanks, but movie reviews do not really have security concerns. A database isn't appropriate for that environment locally, though it is used to manage the reviews offline. There are half a million of these reviews published off-line as text fragments, which get pushed into production when modified. It is a quirky set-up, I know.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.