• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3097
  • Last Modified:

Including UTF-8 content in UTF-8 output

I have some UTF-8 text that I want to include in my PHP. The PHP itself outputs UTF-8, so I thought it should be a no brainer. The PHP has <?php header('Content-Type: text/html; charset=UTF-8'); ?> and <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />.

I was expecting to be able to include the UTF-8 text simply by using include(), but I find that I have to convert it to ISO-8559-1 for it to get converted back again to UTF-8 - see the code snippet. This seems silly. Is there a portable way to make the internal encoding UTF-8 rather than ISO-8559-1 to avoid the to-and-fro conversion?


<?php
if (is_file('frag/movie/review/'.$id_path.'.txt'))
	echo(utf8_decode(file_get_contents('frag/movie/review/'.$id_path.'.txt')));
	//include('frag/movie/review/'.$id_path.'.txt');
else
	echo("<p>No movie review available</p>");
?>

Open in new window

0
rstaveley
Asked:
rstaveley
  • 3
2 Solutions
 
shadow_shooterCommented:
You can simply change the character set of the content to utf8 by special software. I recommend you to use notepad++ but you can google it and there should be a lot of documents explaining how to do it.

If you couldn't do it, let me know.
0
 
rstaveleyAuthor Commented:
You didn't understand my question.

Here is the background:

  1. The text file is valid UTF-8 text (with no BOM). This has been verified.
  2. My PHP outputs UTF-8, using Content-Type: text/html; charset=UTF-8.
  3. I expected to be able to use include() to include the UTF-8 file, but I was wrong.
  4. If I convert my included UTF-8 to ISO-8559-1, using utf8_decode(), it works.
Here is my problem:

  • It seems inefficient for the UTF-8 text to be converted by the PHP script to ISO-8559-1 so that PHP can convert it back again to UTF-8. This must be making it slow and it must mean that it can only handle characters which can be converted to ISO-8559-1.
Here is my question:

  • How do I make my PHP work internally in UTF-8 rather than ISO-8559-1?
0
 
rstaveleyAuthor Commented:
I was wrong about this.

> The text file is valid UTF-8 text (with no BOM). This has been verified.

This is what was going on: http:Q_23900962.html. My verification was plain wrong. I have verified that it really is UTF-8 that I'm generating now and it really is.

I now find that PHP "does the right thing" with include(). Like a server-side include, it assumes that the included content has the character set specified by http://httpd.apache.org/docs/2.0/mod/mod_mime.html#addcharset

This completely makes sense now.

On that basis, it makes sense in my application either to AddCharSet a special extension for UTF-8 text or to echo(file_get_contents($filename)), which is what I've wound up doing.

I expect that this is more efficient than using include() anyhow.
<?php
if (is_file('frag/movie/review/'.$id_path.'.txt'))
 
	# Naive include would only work if the .txt was ISO-8559-1
	# or if .txt was in an AddCharSet for UTF-8 in Apache's
	# directives. The include is high level, going via Apache
	# and the Content-Type reported by Apache is respected and
	# it is converted from that Content-Type.
	#include('frag/movie/review/'.$id_path.'.txt');
 
	# Bad publishes from MySQL "doubly UTF-8'ed" the data and the following
	# bodge designed to convert UTF-8 to ISO-8559-1 was needed to 
	# get UTF-8.
	#echo(utf8_decode(file_get_contents('frag/movie/review/'.$id_path.'.txt')));
 
	# This is the right way to put raw UTF-8 data into the output
	# buffer. Out .txt file goes through no conversion.
	echo(file_get_contents('frag/movie/review/'.$id_path.'.txt'));
else
	echo("<p>No movie review available</p>");
?>

Open in new window

0
 
DamienRocheCommented:
It is far more efficient to store these reviews in a database and edit them via an admin panel, than using an include as oppose to your current method which is also extremely bad practice on a security level.
0
 
rstaveleyAuthor Commented:
Thanks, but movie reviews do not really have security concerns. A database isn't appropriate for that environment locally, though it is used to manage the reviews offline. There are half a million of these reviews published off-line as text fragments, which get pushed into production when modified. It is a quirky set-up, I know.
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now