Link to home
Start Free TrialLog in
Avatar of WarriorPoet42
WarriorPoet42

asked on

Cron Jobs To Create Pseudo-Static Pages?

I have a php page that is query intensive.  I would like to create a cron job that runs every 5 minutes or so that takes the PHP file as input, and outputs a static HTML file to decrease server load.  I have NO idea how to do this.
ASKER CERTIFIED SOLUTION
Avatar of dr_dedo
dr_dedo
Flag of Egypt image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of AndyAelbrecht
AndyAelbrecht

to add to dr_dedo's excellent answer:

if your provider did not provide the location of the php binary file (/usr/local/bin/php) you could find yourself with a little problem; however this is easily fixed with changing the crontab line to:

5 * * * * wget http://my.website.com/crons/convert.php >/dev/null

  basically, this will download the script (which you put in a web-accessible directory now); download a PHP script is the same as executing it, and you do not need Execute access to the CLI PHP interface (which, as far as I know, isn't installed by many webhosts). The script itself would be the exact same as the one dr_dedo provided.

cheers,
Andy
Avatar of WarriorPoet42

ASKER

Would one of you mind giving me a line by line of what the script does?  Code is great, but I like to understand what I'm executing. :D
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Something to watch out for. If you script takes longer than 5 minutes to run, you could end up with mutliple copies of the script running.

Something that is worth adding to your script is a semaphore to NOT run if already running.

Personally, I would use a simple mkdir option.

MKDIR will either succeed or fail. Nothing in between.

so

<?php
$b_semaphore_created = mkdir('LOCK_' . basename($_SERVER['PHP_SELF'], '.php'));
if (True === $b_semaphore)
 {
 // You can run your script as the lock was created.
 ...
 // Remove the lock.
 rmdir('LOCK_' . basename($_SERVER['PHP_SELF'], '.php'));
 }
?>

RQ: If the script takes longer that 5 minutes to run, I am in a world of hurt.  It only takes a second to run . . . but multiply that times hundreds of users, and you can see why I would rather statisize it.

AA: I have a question about this line: $htmldata = fread($dynpage, 1024*1024);

I assume as it reads it into the variable, it is parsing it through the PHP processor and outputting pure HTML into $html data.  What does the 1024*1024 option do?

Thanks
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
1024*1024 is how much data to read at a time, no need to change that

you know, you don't have to run that script every 5 min. follow me here
now you got that index page that is having a lot of hits, this page got some dynamic contnents that come from a db, right ??
if writes to the db is not that frequent (say articles or so), you can run the script only after some data is inserted into the db.
say in some admin page, you add an article that will show in index.php, then, after the insert SQL, you can call the vonverting script!!

so, if you update your database 10 times per day, the page will be updated 10 times per day, and will always have the new data. meanwhile, you'll reduce server overload greatly!!
hope you got what i mean
Just rereading the line ...

if (file_exists($s_filename) && (filemtime($s_filename) < (time() - $i_delay)))

I'm not sure I've got it right.

filemtime has to be within the last 300 seconds.

So, filemtime should be >= time - 300.

Doh!

if (file_exists($s_filename) && (filemtime($s_filename) >= (time() - $i_delay)))

filemtime say at 10:30
current time say is 10:31

filemtime >= (10:31 - 0:05)
 so serve.

Sorry about that!
Good point dr_dedo.

If you are not inserting the data yourself (or by something you have no control over), is there a simple query you can do to determine if the data is old. Like the most recent id on a specific table, or the date on a table.

Say there is and the column is called last_entry.

<?php
// If the last_entry file exists then include it.
if (file_exists('./last_entry.inc'))
 {
 require_once ('./last_entry.inc');
 }

// If there is no last entry variable, define it.
if (!isset($last_entry))
 {
 $last_entry = 0;
 }

// Get the last entry indicator from the DB.
$s_SQL = 'SELECT TOP 1 last_entry FROM some_table ORDER BY last_entry DESC';
$r_conn = xxx_connect($server, $username, $password);
$r_results = xxx_query($s_sql, $r_conn);
$a_row = xxx_fetch_array($r_results);
xxx_free_results($r_results);

// Is the DB' s last_entry the same as our $last_entry?
if ($last_entry !== $a_row['last_entry'])
 {
 // Run your code to generate the page and use the output buffering mechanism I described earlier.

 // Save the last entry.
 file_put_contents('./last_entry.inc','<?php $last_entry = ' . $a_row['last_entry'] . '; ?>');
 }
?>
Oh. If the last last entry is the same as the DBs, then output the cached file.
So, 4 good ways.

1 - Use cron
2 - Use code on page to supply a cached copy based upon a delay period.
3 - Use code on page to detect a "stale" condition and use a cached copy.
4 - Create a cached copy when the data is updated.

I really have no idea of when the most recent change would be - this table is a list of all users and some public data about them - data which not only can be changed at any time, but often is.  Sometimes dozen changes by a single user in a single session.
Ok.

I would combine 2 of the above ways.

Create the static file when the data is changed, but only if the last change was greater then 5 minutes ago.

You can do all of that as part of the update.
You HAVE to service the updates, so not a lot of choice there.

You only have to generate the output file once every file minutes at WORSE.

You may want to add a small message saying "This page is based on changes upto xxxx", where xxxx is the time of the last change.
I will likey use option 2 - use code on page to supply a cached copy based on a delay period.

My code would likely look like this:
<?php
// This file: players.php
$s_filename = './static_players.txt'; // What shall we serve if we are not old enough?
$i_delay = 300; // How long between refreshes?
if (file_exists($s_filename) && (filemtime($s_filename) >= (time() - $i_delay)))
 {
 readfile($s_filename);
 }
else
 {
 ob_start();
 // The original contents of the players.php

 // Grab the output buffer as we want to display it as well as save it.
 $s_output = ob_get_clean();

 // Output the buffer.
 echo $s_output;

 // Save the output buffer for later.
 file_put_contents($s_filename, $s_output);
 }
?>

If that looks correct, I'll close the question.

If that looks correct
The users can always see updated versions of their own data - this page is basically for competative purposes only.  So, based on the above code, I think I will go with serving a static page that is updated every five minutes - period.  I'll inlcude a note that data is updated every five minutes.  Or maybe even the time of the next update.
There needs to be some code following

// The original contents of the players.php




This is where you would issue your queries and generate the HTML as normal.

Other this would work.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
RQ: That last suggestion is spot on.

And yes, where I put "// The original contents of the players.php" is where, quite literally, I would put the full code of my current (non-cached) version of that file, correct?
More or less.

Remove leading <?php and trailing ?>

And that should be it.

One down side is any errors you have will sit in the file for 5 minutes. If you have errors, I would handle them in code properly rather than letting PHP just fall out with them.
Oh damn!  I just thought of something.  The table in the page have options for sorting - currently selecting a new sort reloads the page with the optional sorts added to the queries.  Clearly, dynamic sorting would not work on a cached page . . .

Any ideas? :/
Why not? Instead of just caching the HTML, cache the data also. In arrays and then you can sort the arrays as you want to display the data.

The PAGE is not the issue then, but the data.

Home now. Back tomorrow to explain how to do this.

But 3 files.

1 - to generate the data file.
2 - the data file.
3 - the script which the user uses to sort the data as they like.
I'm glad you'll be back soon.  In the mean time, I'll close this question - as it has been clearly and completely answered - and open a new question about dynamically sorting cached data.
Thanks for the points. More comments on your new question. Interesting question. Something I've done a few times myself, but never placed in concrete.