Monitor sections of a website for changes

John9983
John9983 used Ask the Experts™
on
Is there web service or application that I can be alerted when a specific section of a website changes content.  I'm familiar with Google Alerts, however, I believe it will alert me when it finds new web pages, news articles, blogs, etc.   I have a specific web page in mind and want to be alerted when the content on a certain section changes.

Thanks much,
John
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Michel PlungjanIT Expert
Top Expert 2009
Commented:
Seems there are a lot out there
http://www.google.com/search?q=website+change+monitor
But I seem to remember MS having something too - perhaps BING
You can monitor changes in individual pages with tools like Changedetection
- http://www.changedetection.com/
(the initial interface is not very friendly, but look at some examples and then you might decide to join)

- look also at Diffbot http://www.diffbot.com/products/feedbeater which generates a RSS stream change every time your page is changed

- you might also have a look at http://www.xfruits.com/


Otherwise you might use some javascript on the page to display when the page was last changed... but strange thngs might happen with file caches
Most Valuable Expert 2011
Top Expert 2016
Commented:
How quickly do you need the alert?  If you can settle for receiving the alert within a few seconds of the change, this design pattern might work for you.

Set up a data base table with columns for the URL, the datetime and the digest string
Set up a CRON job that reads the digest string from the data base, and reads the content of the URL
When the digest string of the data base does not match the digest string of URL, a change has occurred

Potential issues with any monitoring strategy, this included, would be auto-generated timestamps in the page.  If every page load would have a new timestamp, no two pages would ever be the same.  That would require you to extend the design a little bit.

You can build on top of this design.  Perhaps you only want to monitor part of the foreign page -- you can run a parsing script in the CRON job to extract only the <DIV> you want to look at, etc.

In PHP, you can use md5() to get the digest string and you can use mail() to send yourself a message.   You can also write an RSS feed.  

HTH, ~Ray
@John,

Which solution(s) did you finally select?
Most Valuable Expert 2011
Top Expert 2016

Commented:
@fibo: Looks like nobody in "clean-up-land" watches these zones ;-)

After reading this question, I wrote a little monitor of my own.
<?php // RAY_monitor_directory.php
error_reporting(E_ALL);
date_default_timezone_set('America/Chicago');


// A CUSTOMIZABLE MONITOR TO NOTIFY YOU OF CHANGES TO A DIRECTORY
// RUN THIS MODULE AS A CRON JOB
// IT EXAMINES THE CONTENTS OF ITS DIRECTORY AND CHECKS FOR CHANGES
// IF THE DIRECTORY HAS CHANGED, IT SENDS YOU AN EMAIL MESSAGE
// YOU MIGHT NAME THIS SCRIPT STARTING WITH A VALUE EQUAL TO WHAT YOU PUT IN $my_ignor


// CUSTOMIZE WITH YOUR EMAIL ADDRESS
$my_email = 'Ray.Paseur@GMail.com';

// FILES PREFIXES TO IGNORE
$my_ignor = 'monitor_';

// FILE NAME TO CHECK FOR CHANGES
$my_check = $my_ignor . 'checkfile.txt';

// WHAT DIRECTORY TO MONITOR
$cwd = getcwd();


// CAPTURE THE BROWSER OUTPUT BUFFER
ob_start();

// A SCRIPT TIMER
$alpha = microtime(TRUE);

// GET FILE INFORMATION FOR THE CURRENT WORKING DIRECTORY
$my_files = my_dir_info( $cwd );
if (!$my_files)
{
    // IF FAILURE, ADD THIS MESSAGE TO THE BROWSER OUTPUT BUTTER
    echo PHP_EOL . "UNABLE TO GET DIRECTORY LIST FOR $cwd" . PHP_EOL;
}
else
{
    // EXCLUDE THIS SCRIPT AND THE CHECKSUM FILE (ANYTHING STARTING WITH $my_ignor VALUE)
    foreach ($my_files as $key => $my_file)
    {
        if (substr($my_file["name"], 0, strlen($my_ignor)) == $my_ignor) { unset($my_files["$key"]); }
    }

    // RESET THE INDEXES TO MAKE serialize() IDENTICAL
    $my_files = array_values($my_files);

    // IF MD5 FILES MATCH, THERE HAVE BEEN NO CHANGES SINCE LAST CHECK
    $my_new_md5 = md5(serialize($my_files));
    $my_old_md5 = file_get_contents($my_check);
    if ($my_new_md5 == $my_old_md5)
    {
        // NO CHANGES
        die();
    }
}

// WRITE THE NEW MD5 FILE
$fpc = file_put_contents($my_check, $my_new_md5);

// IDENTIFY THE DIRECTORY
$uri = $_SERVER["REQUEST_URI"];
$dir = substr($uri,0,strrpos($uri,'/'));
$url = $_SERVER["HTTP_HOST"] . $dir . '/';

// A SCRIPT TIMER
$omega = microtime(TRUE);
$timex = number_format(($omega - $alpha) * 1000.0, 1);
echo "SCRIPT TIME $timex MILLISECONDS" . PHP_EOL;

// GET ANY MESSAGES FROM THIS SCRIPT
$obc = ob_get_clean();

// SEND SUCCESS OR FAILURE MESSAGE
$msg = "SUCCESS WRITING NEW $my_check" . PHP_EOL . $obc . PHP_EOL;

if (!$fpc)
{
    $msg = "FAILURE WRITING NEW $my_check" . PHP_EOL . $obc . PHP_EOL;
}
mail($my_email, "MONITORED DIRECTORY CHANGED: $url", $msg);
die();



// A FUNCTION TO GET SALIENT DATA ABOUT THE FILES IN THIS DIRECTORY - NOT SUB-DIRECTORIES
function my_dir_info($dir)
{
    // APPEND A SLASH IF NEEDED
    if ($dir[strlen($dir)-1] != DIRECTORY_SEPARATOR) $dir .= DIRECTORY_SEPARATOR;

    // IF NOT A DIRECTORY, SIGNAL ERROR
    if (!is_dir($dir))
    {
        return FALSE;
    }

    // IF WE CANNOT READ THE DIRECTORY, SIGNAL ERROR
    if (!$dh = opendir($dir)) return FALSE;

    // AGGREGATE THE DIRECTORY INFORMATION
    $dir_datas = array();
    while ($file_name = readdir($dh))
    {
        // SKIP DIRECTORY POINTERS
        if (!in_array( $file_name, array( '.', '..' ) ) )
        {
            $my_name = $dir . $file_name;
            $my_data = array
            ( 'name' => $file_name
            , 'size' => filesize($my_name)
            , 'type' => filetype($my_name)
            , 'time' => date('c', filemtime($my_name))
            )
            ;
        $dir_datas[] = $my_data;
        }
    }
    return $dir_datas;
}

Open in new window

This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial