Using PHP to Convert HTML to UTF-8 for PDF output

samiam80304
samiam80304 used Ask the Experts™
on
I am working with a PHP class called mPDF, which is a PHP class that generates PDF files from UTF-8 encoded HTML.  It works great, but everything going into it MUST be UTF-8, which is the thing tripping me up I think!

So here is some quick test code that works:

1      <?php
2      define('_MPDF_PATH','mpdf/');
3      include("mpdf/mpdf.php");
4
4      //this is 6x9 book format
6      $mpdf=new mPDF('en-US','Royal','11','dejavuserifcondensed',32,25,27,25,16,13 );
7
8      include "poemfile1.php";
9      $mpdf->WriteHTML($html);
10      $mpdf->Output('mpdfA.pdf','I');
11      exit;
12      ?>

poemfile1.php looks like this:
<?php
$html='
<h1>Title</h1>
<pre>
lots of text</pre>';
<?php>

Using textpad I had to save poemfile1.php AS UTF-8 format. Then when I run the script, my pdf comes out just fine.

Now, what I really want is to write to a file called 'selections.txt' which will contain one or more poems selected by the user and which come from mySQL database.  At this point I haven't brought the db into the picture yet.  Right now, I just want to create the selections.txt file, write to it one or more poems, and then create my pdf.  If I can get *that* to work, I'll move on to selecting from the DB!

So after googling and reading tons of stuff about PHP and UTF-8, I am now totally boggled.  Here is the test code I am trying in the attempt to CREATE a file and save it as UTF-8 format:

1      <?php
2      //iconv_set_encoding("output_encoding", "UTF-8");
3      //iconv_set_encoding("input_encoding", "UTF-8");
4      //var_dump(iconv_get_encoding('all'));
5      //==============================================================
6      define('_MPDF_PATH','mpdf/');
7      include("mpdf/mpdf.php");
8      $selections_file="selections.txt";
9      $input_file="cartog3.htm";
10      //$file_array= array("cartog3.htm","prowl3.htm");
11
12      //this is 6x9 book format
13      $mpdf=new mPDF('en-US','Royal','11','dejavuserifcondensed',32,25,27,25,16,13 );
14
15      $input=file_get_contents($input_file);
16      //echo $input;
17      $fh=fopen($selections_file,"w") or die("can't open file");
18
19      //foreach ($file_array as $poetry_file) {
20            //fwrite($fh,utf8_encode($input));
21            fwrite($fh,iconv("ISO-8859-1", "UTF-8", $input));
23            fclose($fh);
24            //file_put_contents($selections_file, $input, FILE_TEXT|FILE_APPEND);
25      //}
26
27      $fh=fopen($selections_file,"r") or die("can't open file");
28      $html=fread($fh, filesize($selections_file));
29      fclose($fh);
30      echo $html;
31
32      $mpdf->WriteHTML($html,2);
33      $mpdf->Output('mpdfA.pdf','I');
34      exit;
35      ?>

You can see from various commented lines different things I've been trying.  I have saved the file cartog3.htm as a UTF-8 file and it now contains ONLY html and text (unlike the version that works that includes a PHP file with PHP information in it).  However, when I echo $html (on line 30), my text is preceded by the characters:  

And the PDF of course is just a bunch of junk characters.

Can anybody help?  I get the feeling that I am somehow making it harder than it needs to be?

Thanks!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Hi,

is it possible that you utf8_encode a file that is already utf8 encoded?

Try putting the following before the fwrite:

    if (preg_match('/./u', $input)) $input = utf8_decode($input);

Regards,

 sd

Author

Commented:
Hmmm, well, sd's suggestion gave an interesting result.  Now when I echo $html (line 30), the  characters that appear before the html output is now a single question mark, like this "?".  When I run the entire script, I get a message from Adobe Reader:  File does not begin with '%PDF-'.

Ideas?  And thanks!

Author

Commented:
Hello again,
Well, I have only half-way solved my problem.
It seems that mPDF is based in part on tcpdf, and so looking at that doc (mpdf doc is not so good), I saw a different output options, so for line 33, I did this:
$mpdf->Output('mpdfA.pdf','F');
where F is saving to a local file.  When I do that, a PDF file is created and I can open it with no problem (manually).  As for why I can't open this file inline in the browser is still a mystery.  There are several pages here on EE that have reported "File does not begin with '%PDF-'.  It seems there is some kind of security-related issue with IE, but that doesn't explain why they same problem appears in FF. (I have IE 7.0.5730.11 and FF 3.04 running on Windows XP Pro SP3)   Thanks for any additional insight.

Author

Commented:
Even though my problem was not solved directly (I still can't display PDF to the browser inline), this suggestion helped me move forward.  I did discover with more research that the suggestion is potentially flawed - see http://www.phpwact.org/php/i18n/charsets, under Checking UTF-8 for Well Formedness.  Nevertheless, I learned what I did based on the Expert's suggestion, so many thanks!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial