Solved

Page crashes with custom PHP & pdftk usage via exec call

Posted on 2010-08-19
24
1,097 Views
Last Modified: 2012-05-10
We have a webpage that builds PDF documents based on individual PDF pages that are managed by the company. You can see the webpage here:
http://www.hovercrafttraining.com/manual

Ignoring the part that requires login, there are several manuals available in PDF format in the middle of the page. Each of these is built from individual PDF pages and concatenated using pdftk on the server, where PHP pulls what pages need to be included and executes a custom call to the pdftk command. The output of the pdftk command is placed in the site for downloading at each page load to ensure the most updated manuals are available. Currently, we are limiting the build to a small number of individual PDF pages, but if we were to allow all of the pages to be included, the page would never finish loading.

The problem lies in that PHP's use of the exec/passthru/system or any other function that allows access to server commands does not complete correctly. By this I mean that if the same PHP command is applied directly to the command line, i.e. "pdftk foo.pdf bob.pdf cat output manual.pdf", this command succeeds and the manual.pdf works fine. However, if the command is applied through the use of exec/passthru/etc via PHP, the command never finishes. The page will eventually time out and a look at the server's open processes show the pdftk command is running, although it takes only a second for the same command to complete via the command line.

The problem appears to occur when many individual PDF files are passed to pdftk, as we have no problem with up to around 10 or so files. If we include more (and there are more to be included), the process does not complete. However, we get different results when testing with a different number of files (i.e., we are always able to build up to 6-10 files, sometimes 11, never more than that at once) and the final output PDF cannot be more than 8 MB in size. These are just observations from our tests in trying to work this out. On the command line, there is no limitation or issue with many individual PDF pages, the problem only occurs when PHP is brought into the picture.

I have attempted to contact the pdftk team, no response, so I hope someone here can point us in the right direction.

My question is two-fold. Why would the use of exec/system/etc via PHP be any different than passing the exact same string to the command-line? What can I look into to figure out what is happening on the PHP side of things?

Second, is there any PHP library that we can use in place of pdftk to concatenate individual PDF files into one file available for download where we might be able to workaround this problem? Or does anyone have an alternative suggestion that will meet the requirements of building custom PDF files based on individual pages managed separately?

Thanks in advance for any advice or assistance you can provide.
0
Comment
Question by:dageyra
  • 14
  • 6
  • 2
  • +2
24 Comments
 
LVL 9

Expert Comment

by:jeremycrussell
Comment Utility
I would suspect something not setup in the environment of the webserver that is in the terminal session.  Use the phpinfo() function to get what the webserver environemnt looks like and compare to terminal session environment.
0
 
LVL 1

Expert Comment

by:cyberpunk71
Comment Utility
I don't much about PDFTK but I will try to help. I have ran many system commands from PHP and they do take some fi ni glin to get them to work properly.

Have you ran the PHP page from the command line without all the html markup language to see if any errors show up? Also "system" command can output its results back to the page.

I am thinking something is happening with the command when apache executes the command. Remember when PHP executes a command its a unprivileged user so if it executes a command that requires additional buffer memory then your command will fail because the OS is designed to prevent unprivileged processes from running away and destroying the OS. I had this problem one time and I used sudo to allow this user to execute this command as a super user but only that command.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Hello Guys:

I did not mean that the command works from PHP command line, it works directly from bash command line.  So the command "pdftk 1.pdf 2.pdf cat output 3.pdf" is the command in question.  When inserted directly into bash, it works fine.  When using the exec/system/etc function within PHP, making it web-based, the command never finishes in the background.  However, the issue only arrises when many PDFs are included, so say up to 10 or so is fine in PHP but adding one more causes the command to never finish; any number of PDFs work fine from the bash command line.  A similar issue that we already know about is that using the bash command line, pdftk can create a file of any size, but when using PHP, if the final file is greater than or equal to 8 MB, the process never completes no matter how many PDFs are included (that does not apply in this case, the final size will only be about 1 MB or so).  Just wanted to clear that up.

I will try the command out via PHP command line as that removes Apache from the equation and could be the problem.  Also, I don't think permissions is the problem since the command works fine for small numbers of individual PDF files.

Thanks again for your help, this is a real brain bender and very annoying problem.
0
 
LVL 9

Expert Comment

by:jeremycrussell
Comment Utility
Ok.  then you'll want to check for the php script timeout and max memory limit php configurations in your php.ini.


max_execution_time = 120     ; Maximum execution time of each script, in seconds
max_input_time = 60     ; Maximum amount of time each script may spend parsing request data
max_input_nesting_level = 128 ; Maximum input variable nesting level
memory_limit = 512M      ; Maximum amount of memory a script may consume (128MB)

This may be restricted on your server, if its consistent on the size and time, that would be a good indicator.
0
 
LVL 1

Expert Comment

by:cyberpunk71
Comment Utility
OHHHHH Its a setting in your php.ini file.

memory_limit = 8M

or (and this is a big if but maybe)

post_max_size = 8M
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
jeremycrussell:

I don't believe the max_execution_time or input_time would be a factor since the command via the command line finishes in less than a second.  I have never heard of max_input_nesting_level, but it's set to 64.  Do you know what this setting controls?  The memory limit is actually set @1GB due to other system requirements on the machine, so I would think that should be plenty.  When adding one additional PDF page to the command, we are talking about maybe a couple hundred KB max, more realistically maybe 60 KB.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
cyberpunk:

Our memory limit has been increased as far as I'm willing to let it go, to 1024M.  What would post_max_size control?  It's set to 16M btw.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Just so you guys can get a look, here is the phpInfo I use when looking up these settings:

http://ryan.neoterichovercraft.com/phpInfo.php
0
 
LVL 9

Expert Comment

by:jeremycrussell
Comment Utility
So, when you run the php page, and submit the command with more than 10 pdfs, the php page will ??? timeout, or continue to run for ever?

When that happens... can you see a process running pdftk?
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
jeremycrussell:

The PHP page on the site builds several PDFs from individual pages using pdftk.  When more than 10 pages are added to one of the PDF builds, the page eventually times out.  When this happens and I look at the server (ps -auxf), I see the pdftk command still running under the apache process.  The same pdftk command (I had PHP print out the exact command for testing) can be inserted into the bash command line & it completes in under 1 second.  If just one of the PDF pages is removed from the build (to get it under 10), there is no timeout and the pdftk process completes normally.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
One things I've considered is that maybe the exec/system/passthru (they all fail in the same way) command has a limitation on how big the command is.  The bash command line would not suffer from this, and the pdftk command can grow very large as the individual PDF pages are passed as absolute paths on the server, which can be up to 20 or so characters, so when we have 10 or more pages, we're talking a minimum of 250 characters in the string parameter passed to exec.  Maybe there is some kind of limitation here?
0
 
LVL 9

Expert Comment

by:jeremycrussell
Comment Utility
Yes, I think the command string is limited, but it's pretty high.
Run ps with -w (or multiple w's) to widen the output and see if your command is getting truncated.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 1

Author Comment

by:dageyra
Comment Utility
jeremycrussell:

That was an interesting experience with the "-w".  I was able to determine that the size of the command is not the issue (other PDFs on the page have much longer commands and are fine).  The full command string was present in the process list.  One interesting note was that with the exec call, I noticed that the process was doubled up (so there were actually 2 pdftk calls that were executing).  I thought maybe this was the problem (I'm not fast enough to see what the process list shows when the system works), but I changed exec to system, and the same hanging problem occurred but the process list only had 1 pdftk call.  I don't know why exec doubled up the call, but the net result was the same: pdftk never finished (I have to kill the process every time I test this).  Just as before, one additional PDF page causes the command to hang, whereas the original number completes without a hiccup.  I will stick with system as I don't like the idea of doubling up every output PDF, just an unrelated note.

I did copy and paste the command directly from the process list while the pdftk was hanging, and it completed fine from the bash command line.
0
 
LVL 9

Expert Comment

by:jeremycrussell
Comment Utility
Hmm... The next suggestion would be to pehaps use passthru() and see if you can get ouput, or.. add some redirection of its output, both stdout and stderr( 2>&1,) to a file somewhere and tail that file and see if the command, when ran by the webserver, gives any output, err, etc.. like its waiting for a response of some sort.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
jeremycrussell:

I did try to use passthru, the same result occurs.  I haven't tried with any kind of redirection, but I'm wondering whether that would have an effect.  The stderr might do something perhaps.  How would I change the string command to redirect to a file, just add 2>&1 to the end of the string?  I'm not that familiar with redirection outside of pipes.

Another thing I want to try, it will just take some more time, is using command line PHP to process the same script and see if the problem occurs, as I'm curious if somehow apache is the problem.
0
 
LVL 9

Expert Comment

by:jeremycrussell
Comment Utility
Just add the 2>&1 right before the redirection to a file.

i.e   command 2>&1 >>/tmp/command.log

To test on php command line, just grab a command from ps, like earlier and run it in the php cli inside system().
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
I added the 2>&1 >> /tmp/pdftk.log, and it did create the file, but it never has any data in it.  Not sure precisely what that means, but I believe it means no system error (permission, etc) is occurring.

I did setup a test page that can be run inside the webserver or CLI.  When run from CLI, the process finishes without problem & very quickly (output file created and working).  The script only has a system call, so it can be run from the webserver.  When the exact same file is run from the webserver, it hangs.

I believe this to mean that PHP itself is fine but something is amiss with apache?
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Also, I have been able to determine the problem is not specifically about the number of individual pages.  I'm not sure what else it can be, so some more testing is required, but I pulled the W4 form off the IRS website to add a PDF that is commonly used to the concatenation.  If only one of pages is passed along with the W4 form, the pdftk hangs on the webserver (but still finishes without a problem via CLI PHP).

Some other tests:

I tried concatenating the W4 over and over, it crashes if just two copies are attempted (the same file passed twice).  The W4 is 168K.

I tried concatenating one of the individual pages several times.  One of the pages that is 84K alone I was able to get to concatenate up to 8 pages, but the 9th page caused it to hang.  For the 8 pages, the final PDF size was 616K.

Another individual page that is 12K in size alone I was able to get to concatenate up to 15 pages, but the 16th page caused it to hang.  For the 15 pages, the final PDF size was 96K.

Another page that is 36K in size alone I was able to get to concatenate up to 15 pages, but the 16th page caused it to hang.  For the 15 pages, the final PDF size was 432K.

You can see each of the pages I'm using here: http://ryan.neoterichovercraft.com/pdf/
1.pdf is the 12K file, 2.pdf is the 36K file, 3.pdf is the 84K file, 4.pdf is the IRS W4 that is 168K.

Again, when running from CLI PHP, the problem never occurs.

Any thoughts?
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
This is the exact same problem:
http://farbfinal.wordpress.com/2009/08/16/pdftk-php-problem-hang-without-errors/

I will try to work out their workaround, but the inclusion of perl is a bit beyond my comfort zone.
0
 
LVL 5

Expert Comment

by:kawzaki
Comment Utility
maybe the problem is with exec() call it self.

try modifying your exec() call to the following:


$output = array();
$return = 0;
exec("{$cmd}", $output, $return);

when you invoke exec(), php will wait for the exec() to finish and return the output.

i faced similar problem recently and that's how it was solved :)



0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Hello kawzaki:

The problem isn't exec because there are times when the command works and times when it doesn't all based on which PDF pages are passed to the pdftk command.  Regardless of when it does or doesn't work via apache/php, the exec always works via CLI PHP & the individual pdftk command works via shell and always completes in a matter of seconds (so even with exec, the command should complete quickly and PHP should not be waiting anymore).

This blog post shows that there is a problem with pdftk from an apache/php standpoint: http://farbfinal.wordpress.com/2009/08/16/pdftk-php-problem-hang-without-errors/

What I need now is help in converting the exec (actually now I'm using system) to a URL open to a perl script that executes the pdftk command.  I need help as I'm not that great with perl, and the perl script needs to accept dynamic parameters (the number of individual PDF pages passed to pdftk for concatenation will vary) and create the output PDF for linking by the PDF script.
0
 
LVL 1

Accepted Solution

by:
dageyra earned 0 total points
Comment Utility
Sid Steward with pdftk has informed me a new version has been released which may resolve this problem.  http://www.pdflabs.com/docs/install-pdftk/
0
 
LVL 1

Author Closing Comment

by:dageyra
Comment Utility
A new version of pdftk was released that claims to address the problem of pdftk hanging in PHP (and also python).  Here is the most recent version history that claims to address this problem (v1.43):

http://www.pdflabs.com/docs/pdftk-version-history/
0
 

Expert Comment

by:abitat
Comment Utility
Anyone know of a workaround solution without having to use the latest version of pdftk? I'm having the same problem using the fill_form option. We don't really want to have to download source code and do a build on our customer's server, so we'd prefer a workaround. The article http://farbfinal.wordpress.com/2009/08/16/pdftk-php-problem-hang-without-errors/ mentioned using a perl wrapper, but I don't know any perl. Any help would be greatly appreciated.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
In Solr 4.0 it is possible to atomically (or partially) update individual fields in a document. This article will show the operations possible for atomic updating as well as setting up your Solr instance to be able to perform the actions. One major …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now