Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Real Simple Syndication (RSS) with a Single PHP Script

Published:
Updated:
The Client Need Led Us to RSS
I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their constituents are mostly bankers and builders.  The average demographic was male, 50 years old, using a desktop computer, and occasionally a Blackberry.  Not exactly the Twitter crowd.  We considered using broadcast email, but the client wanted something more automatic and less intrusive.  And since attachments may not make it to a Blackberry something other than email seemed to be needed.

When the investment company released a publication, they made a PDF for print and they were willing to use FTP to place a copy of the PDF on their web server.  But that was the extent of their interest in the process; they wanted automation to handle the rest.

So we arrived at the idea of an RSS feed.  It was automated, and just low-tech enough that everyone could understand it.

RSS Feeds
An RSS feed is simply a specialized subset of XML.  It carries only a few bits of information, such as a title, description and link.  RSS is very ightweight and easy to use.   A competing standard, Atom, is quite similar.  We chose RSS because everyone at the meeting had heard of it, perhaps because it also forms the basis for podcasting.  
More information on RSS is available here:
http://cyber.law.harvard.edu/rss/rss.html
And just for fun, the largest RSS icon I have ever seen is available here:
http://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Feed-icon.svg/500px-Feed-icon.svg.png

RSS Feed Readers
An RSS feed is usually consumed by an RSS reader program.  And almost all web browsers can consume RSS feeds.  The typical design of an RSS reader is a small "system-tray" program with a time clock in it.  Whenever the clock expires, the reader goes to its list of URLs that point to RSS feeds.  It reads the current XML file from each of those URLs and compares it to the older version of the file (if any).  The reader gives you a "heads-up" if anything is new or changed.  You subscribe to RSS feeds by telling the RSS reader what URL to follow.  Pretty simple and highly effective.  

Our Client Application
The investment company did not want to be bothered with the technical details of trying to create and store an XML string on the server.  They wanted the process to be automatic.  So we built a PHP script to handle the RSS publishing.

The Design of the PHP Script
We set up a simple script to list the PDF files in its directory.  In practice there was a little more styling than you will see here, but all the moving parts are the same.  Since we needed a way to know when the directory had changed, we decided to collect the list of files and make an md5() string from the file information.  If the new md5() string did not match the old md5() string, we had detected a change in the directory.  That change would trigger the writing of the new XML file that contained our RSS feed.  We installed this script as the "index.php" script in the PDF directory.  And we set up a "cron" job that ran this index.php script every 15 minutes.

That was it - nothing more was needed.  Whenever the client added a PDF to the directory the audience would get an RSS update within a few minutes.  Because cron jobs simply discard browser output, it seemed fine to merge the functions of listing the directory and writing the RSS feed into a single script.  Having only one script kept everything in one place and made our debugging very easy.

Notes on the Script
We have two hard-wired file names in the script.  These variables are set at lines 6-7.  The $md5 is the name of the file that contains a message digest string (more on that later).  The $rss is the name of the XML file that contains our RSS feed.  Since the script will create or re-create these files at run time, we did not need to do any initialization or special preparation of other files and scripts.

Our first important PHP task is to get a list of the PDF files that are in the directory.  We do that between lines 10-37.  Each file is represented by an associative array containing the name, the size and the time the file was last modified.  We collect these arrays into an array of arrays on line 34.  It is possible in theory, that there would be no PDF files in the directory and we test for this condition on line 40.  More probably we have found some PDF files and we would want to display them.

It makes sense to list the most recent PDF publications first, and to do that we need to sort the array of arrays.  PHP has a function usort() that facilitates this.  It walks the main array and presents each pair of data elements to an external function named in the second argument to usort().  In our case these data pairs are our sub-arrays.  Our timesort() function (lines 102-107) is just a black box that sorts an array of arrays in descending sequence by a named key.  More documentation on usort() can be found in the php.net man pages.
http://php.net/manual/en/function.usort.php

Once our list of PDF files is extracted and sorted, we want to create a web page with links to each of the PDF files.  We need to find the path to whatever directory we are running in. A generalized solution can be done in two statements (lines 46-47) where we create the $urp variable.  This gives us the URL of a relative link to our PDF files.

Our web page will contain an HTML string describing the collection of PDF files.  Each line in the list of PDF files contains three important pieces of information: the time the PDF file was created or modified, the name (and link) to the PDF file, and the size of the PDF file.  A function call on line 62 invokes the showfilesize() function (lines 110-126) to format the file size in a tidy abbreviation.  This HTML string serves two purposes.  It is useful client browser output, and it becomes the basis for determining if the collection of PDFs in the directory has changed.  We store this HTML string in the $out variable.

Our next step is to make a message digest of this HTML string with md5() and compare it to the existing message digest, if any.  The md5() function is perfect for our needs because any change in the input strings, however small, is certain to result in a different md5() code.  The risk of collisions is near zero in this application.
http://php.net/manual/en/function.md5.php

Our code will detect any difference in the md5() strings and will write a new RSS feed if there is a difference.  We read the old string (lines 69-70), create and write the new string (lines 73-74), and compare the strings.  If there is any difference we call the create_rss_feed() function on line 79.  The create_rss_feed() function is defined at line 129.  It gathers a few local variables and uses heredoc syntax to substitute the values into the RSS string.  It writes the XML file that is our RSS feed.
http://php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc

Beginning on line 82 (whether or not we wrote a new RSS feed) we create the rest of the HTML for browser display.  We create a page header (lines 84-87) and append the $out HTML string that contains our directory list (line 88).  Finally we add a link to the RSS XML file that invites page visitors to subscribe via RSS (lines 90-94).  We echo the browser output and our web page is complete.

 
<?php // RAY_EE_RSS_news_index.php
                      error_reporting(E_ALL);
                      date_default_timezone_set('America/Chicago');
                      
                      // THE NAME OF THE MESSAGE DIGEST FILE AND THE RSS FEED
                      $md5 = 'RAY_EE_RSS_monitor.txt';
                      $rss = 'RAY_EE_RSS_rssfeed.xml';
                      
                      // THE PATH TO OUR CURRENT WORKING DIRECTORY
                      $dir = getcwd();
                      
                      // APPEND A SLASH IF NEEDED
                      if ($dir[strlen($dir)-1] != DIRECTORY_SEPARATOR) $dir .= DIRECTORY_SEPARATOR;
                      
                      // AGGREGATE THE DIRECTORY INFORMATION INTO AN ARRAY OF ARRAYS
                      if ($dh = opendir($dir))
                      {
                          $dir_datas = array();
                          while ($file_name = readdir($dh))
                          {
                              // WE ONLY WANT TO CONSIDER FILES NAMED LIKE *.PDF
                              $ext = explode('.', $file_name);
                              $ext = end($ext);
                              $ext = strtoupper(trim($ext));
                              if ($ext == 'PDF')
                              {
                                  $my_name = $dir . $file_name;
                                  $my_data = array
                                  ( 'name' => $file_name
                                  , 'size' => filesize($my_name)
                                  , 'time' => date('c', filemtime($my_name))
                                  )
                                  ;
                                  $dir_datas[] = $my_data;
                              }
                          }
                      }
                      
                      // IF NO PDFS
                      if (empty($dir_datas)) die();
                      
                      // CALL THE FUNCTION TO SORT THE ARRAY BY THE filemtime()
                      usort($dir_datas, 'timesort');
                      
                      // GET THE URL PATH
                      $poz = strrpos($_SERVER["PHP_SELF"], DIRECTORY_SEPARATOR);
                      $urp = substr($_SERVER["PHP_SELF"], 0, $poz) . DIRECTORY_SEPARATOR;
                      
                      // CREATE LINKS TO EACH OF THE FILES
                      $out = NULL;
                      foreach ($dir_datas as $pdf)
                      {
                          $out .= $pdf["time"];
                          $out .= ' ';
                          $out .= '<a title="New Window for the PDF document" target="pdf" href="'
                          . $urp
                          . $pdf["name"]
                          . '">'
                          . $pdf["name"]
                          . '</a>'
                          . ' '
                          . showfilesize($pdf["size"])
                          . '<br/>'
                          . PHP_EOL
                          ;
                      }
                      
                      // GET THE CURRENT MESSAGE DIGEST
                      $md5 = $dir . $md5;
                      $old = @file_get_contents($md5);
                      
                      // MAKE THE NEW MESSAGE DIGEST AND WRITE IT
                      $new = md5($out);
                      file_put_contents($md5, $new);
                      
                      // IF THERE IS A CHANGE, CREATE A NEW RSS FEED
                      if ($old != $new)
                      {
                          create_rss_feed($rss);
                      }
                      
                      // CREATE AND DISPLAY THE VIEW OF THE DIRECTORY
                      $htm
                      = '<h1>'
                      . "PUBLICATIONS: "
                      . date('F j, Y')
                      . '</h1>'
                      . $out
                      . '<br/>'
                      . 'SUBSCRIBE VIA '
                      . '<a title="New Window for the RSS Feed" target="rss" href="'
                      . $rss
                      . '">RSS</a>'
                      . PHP_EOL;
                      ;
                      echo $htm;
                      die();
                      
                      
                      
                      
                      // A USER SORT FUNCTION TO ORDER BY DATETIME DESCENDING (NEWEST ON TOP)
                      function timesort($a, $b, $key='time')
                      {
                          if     ($a[$key] == $b[$key]) return 0;
                          return ($a[$key] >  $b[$key]) ? -1 : 1;
                      }
                      
                      
                      // FUNCTION TO PRODUCE AN EASY-TO-READ DESCRIPTION OF THE SIZE OF A FILE
                      function showFileSize ($xb)
                      {
                          $pb = 1024*1024*1024*1024*1024;
                          $tb = 1024*1024*1024*1024;
                          $gb = 1024*1024*1024;
                          $mb = 1024*1024;
                          $kb = 1024;
                          if     ($xb >= $pb) { $text = number_format(($xb / $pb),3) . " Pb"; }
                          elseif ($xb >= $tb) { $text = number_format(($xb / $tb),2) . " Tb"; }
                          elseif ($xb >= $gb) { $text = number_format(($xb / $gb),1) . " Gb"; }
                          elseif ($xb >= $mb) { $text = number_format(($xb / $mb),1) . " Mb"; }
                          elseif ($xb >= $kb) { $text = number_format(($xb / $kb),0) . " Kb"; }
                          elseif ($xb >= 0)   { $text = number_format( $xb       ,0) . " bytes"; }
                          else                { $text = "0 bytes"; }
                          return $text;
                      }
                      
                      
                      // FUNCTION TO CREATE AND WRITE THE RSS FEED
                      function create_rss_feed($filename)
                      {
                          // SET SOME VARIABLES FOR THE FEED
                          $pubdate = date('r');
                          $link    = $_SERVER["HTTP_HOST"] . $_SERVER["REQUEST_URI"];
                          $host    = $_SERVER["HTTP_HOST"];
                          $title   = 'New PDF online at ' . $host;
                      
                      // THE RSS DECLARED IN HEREDOC SYNTAX
                      $rss = <<<RSS
                      <?xml version="1.0" ?>
                      <rss version="2.0">
                      <channel>
                        <title>$title</title>
                        <description>RSS Feed from $host</description>
                        <link>http://$host</link>
                        <pubDate>$pubdate</pubDate>
                        <item>
                          <title>$title</title>
                          <description>$title</description>
                          <link>http://$link</link>
                          <pubDate>$pubdate</pubDate>
                        </item>
                      </channel>
                      </rss>
                      RSS;
                      
                          // WRITE THE RSS FEED INTO THE DIRECTORY
                          file_put_contents($filename, $rss);
                      }

Open in new window

Summary
This article and the code snippet demonstrates a simple way to meet a client's requirement for broadcasting important information to their constituents.  It is completely self-contained in a single PHP script and would work correctly even if the cron job failed to trigger the script in a timely manner.  Any visitor who came to this web page would trigger the RSS update (if needed) and thus all subscribers would be made aware of the news at essentially the same time.

Please give us your feedback!
If you found this article helpful, please click the "thumb's up" button below. Doing so lets the E-E community know what is valuable for E-E members and helps provide direction for future articles.  If you have questions or comments, please add them.  Thanks!
 
5
6,993 Views

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.