Link to home
Start Free TrialLog in
Avatar of nainil
nainilFlag for United States of America

asked on

RSS Feed to Mysql Database

Hi,

I am using this script from Daniel Iversen to save RSS feeds from the following URL's:
http://localhost/test/index_clean.php?feed_url=http://feeds.feedburner.com/thewhir_magazine&access_key=youraccesskey

and

http://localhost/test/index_clean.php?feed_url=http://rss.cnn.com/rss/cnn_topstories.rss&access_key=youraccesskey

What I am finding is that this script does not store all the feeds. So if there are 15 feed items it sometimes just stores 10. It varies. Am I doing something wrong?

Or is there a better script that will allow me to store RSS feeds to a database?

<?php

/*

==================================================================
==================================================================
RSS Ingest v 0.1
- by Daniel Iversen, daniel (a dot) iversen (the at sign) gmail (another dot) com
------------------------------------------------------------------

INTRO:
- - - - - - - - -
 - Just a tiny primitive PHP script that you can run on a cron to ingest
   an RSS feed into a flat database table.

 - You can run the PHP many times with different RSS URLs

 - Its so simple to use; 1 php file, 4 parameters to change and you are done

 - Its super simple code - you can easily fiddle with it

 - If you find it useful, please drop a mail ;)


INSTALLATION AND USAGE:
- - - - - - - - - - - - - - -
 - Well its free, no license or warranty for this tool - if your house burns or your cat gets sick don not blame  me ;)
 - Change the few variables in this PHP (db connection info and access key/password)
	The 4 variables you need to look for in this PHP (usually grouped) are;

	$db_hostname="     - ....your db hostname here - maybe 'localhost' is enough...
	$db_username="     - ....your db username here...
	$db_password="     - ....your db password here...
	mysql_select_db("  - ....your db name here...

$private_access_key="youraccesskey";

 - create table using SQL below
 - create table indexes if needed
 - upload PHP to your server
 - hit the PHP with the URL:
     http://<your server>/<RSSIngest path>/<PHP file>?feed_url=<your RSS URL - must stay the same as its a key>&access_key=<secret key>
	 e.g.
     http://nexle.dk/rssingest/index.php?feed_url=http://www.instapaper.com/starred/rss/580483/qU7TKdkHYNmcjNJQSMH1QODLc&access_key=ThisIsNotMyKey
 - watch for errors and things
 - if all is o.k., install the PHP URL in a cron job (e.g. the free setcronjob.com) and relax ;)
 - If you do find it useful, please shoot me a mail, would love to know
 - If you make bug fixes, please feel free to send back for everyones benefit



CREATE TABLE SQL:
- - - - - - - - - - - - - - -
	CREATE TABLE  `rssingest` (
	 `item_id` VARCHAR( 32 ) NOT NULL ,
	 `feed_url` VARCHAR( 512 ) NOT NULL ,
	 `item_content` VARCHAR( 4000 ) NULL ,
	 `item_title` VARCHAR( 255 ) NOT NULL ,
	 `item_date` TIMESTAMP NOT NULL ,
	 `item_url` VARCHAR( 512 ) NOT NULL ,
	 `item_status` CHAR( 2 ) NOT NULL ,
	 `item_category_id` INT NULL ,
	 `fetch_date` TIMESTAMP NOT NULL
	) ENGINE = MYISAM ;



NOTES:
- - - - -
 - Only tested on MySQL
 - Done in an hour so there could be lots of bugs
 - Only tested with my particular RSS feed that I needed to ingest (Instapaper)
 - No attempt to prevent SQL injection but should be o.k. since its password protected and for your own use only


==================================================================
==================================================================

*/
//require_once("dbCon/dbcon.php");

$db_hostname="localhost";
$db_username="root";
$db_password="";

$private_access_key="youraccesskey";

// Check a few bits and pieces

if(isset($_GET['feed_url']))
{
	$feed_url = $_GET['feed_url'];
}
else
{
	die("Need to pass the (consistent) 'feed url'");
}


if(isset($_GET['access_key']))
{

	if($_GET['access_key']==$private_access_key)
	{
		echo "Access key correct, proceeding...<br/><br/>";
	}
	else
	{
		die("wrong access key");
	}
}
else
{
	die("Need to pass the 'access_key' URL parameter");
}


try
{
	/*  query the database */
	// $db = getCon();

	$db = mysql_connect($db_hostname,$db_username,$db_password);
	if (!$db)
	{
		die("Could not connect: " . mysql_error());
	}
	mysql_select_db("test", $db); // Nainil

	echo "Starting to work with feed URL '" . $feed_url . "'";

	/* Parse XML from  http://www.instapaper.com/starred/rss/580483/qU7TKdkHYNmcjNJQSMH1QODLc */
	//$RSS_DOC = simpleXML_load_file('http://www.instapaper.com/starred/rss/580483/qU7TKdkHYNmcjNJQSMH1QODLc');

	libxml_use_internal_errors(true);
	$RSS_DOC = simpleXML_load_file($feed_url);
	if (!$RSS_DOC) {
		echo "Failed loading XML\n";
		foreach(libxml_get_errors() as $error) {
			echo "\t", $error->message;
		}
	}


	/* Get title, link, managing editor, and copyright from the document  */
	$rss_title = $RSS_DOC->channel->title;
	$rss_link = $RSS_DOC->channel->link;
	$rss_editor = $RSS_DOC->channel->managingEditor;
	$rss_copyright = $RSS_DOC->channel->copyright;
	$rss_date = $RSS_DOC->channel->pubDate;

	//Loop through each item in the RSS document

	foreach($RSS_DOC->channel->item as $RSSitem)
	{

		$item_id 	= md5($RSSitem->title);
		$fetch_date = date("Y-m-j G:i:s"); //NOTE: we don't use a DB SQL function so its database independant
		$item_title = $RSSitem->title;
		$item_content = $RSSitem->description;
		$item_date  = date("Y-m-j G:i:s", strtotime($RSSitem->pubDate));
		$item_url	= $RSSitem->link;

		echo "Processing item '" , $item_id , "' on " , $fetch_date 	, "<br/>";
		echo $item_title, " - ";
		echo $item_date, "<br/>";
		echo $item_url, "<br/>";

		// Does record already exist? Only insert if new item...

		$item_exists_sql = "SELECT item_id FROM rssingest where item_id = '" . $item_id . "'";
		$item_exists = mysql_query($item_exists_sql, $db);
		if(mysql_num_rows($item_exists)<1)
		{
			echo "<font color=green>Inserting new item..</font><br/>";
			$item_insert_sql = "INSERT INTO rssingest(item_id, feed_url, item_title, item_content, item_date, item_url, fetch_date) VALUES ('" . $item_id . "', '" . $feed_url . "', '" . $item_title . "', '" . $item_content . "', '" . $item_date . "', '" . $item_url . "', '" . $fetch_date . "')";
			$insert_item = mysql_query($item_insert_sql, $db);
		}
		else
		{
			echo "<font color=blue>Not inserting existing item..</font><br/>";
		}

		echo "<br/>";
	}

	// End of form //
} catch (Exception $e)
{
    echo 'Caught exception: ',  $e->getMessage(), "\n";
}
?>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of hielo
hielo
Flag of Wallis and Futuna image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nainil

ASKER

@hielo : Thanks. That didn't help much. I still cannot get all feeds to populate in the DB. The strange part is that it does display all the feeds on the page but it does not insert all into the db.

@ray_paseur: Ray I agree that there are red flags. However it's always easy to use code that exists than develop something new in the time that one has. I'll put some diagnostic code to find what is going wrong.

However if you have some other suggestions or script that can help please do share. Thanks.
I would add error_reporting(E_ALL) and try some data visualization, like var_dump($RSS_Doc).  I think hielo is on the right track, and I would also add the pubDate field to the md5() message digest.  It seems possible to me that the same title and description could be published repeatedly with a text that said something like "Breaking news update..."  However if the publishers did this it seems unlikely that the updates would have the same timestamp as the original article.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nainil

ASKER

I guess I figured out the issue.

It was the mysql insert statement that was causing special characters to break the insert statement and hence the query was failing. Thanks for your help.