Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1869
  • Last Modified:

Parsing Typepad RSS feed using PHP

Hi,

I'm trying to pull our xml feed on Typepad and loop through it so that i can display it on our own website.

Here's the snippet of code that I'm using to simply grab the xml feed:

<?php

$xml_feed_url = 'http://blog.strategos.com/innovaro_leading_innovati/rss.xml';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_feed_url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml = curl_exec($ch);
curl_close($ch);

var_dump ($xml);

?>

When I dump it I can see that all of our posts have been pulled.  What steps do I need to do next for me to loop through the $xml data to so that I can display the "title", "full description", "author", "pubdate" etc info?

Thanks for any help!

Kenny
0
kennyu
Asked:
kennyu
  • 4
  • 4
1 Solution
 
Ray PaseurCommented:
Please see http://www.laprbass.com/RAY_temp_kennyu.php

<?php // RAY_temp_kennyu.php
error_reporting(E_ALL);
echo '<pre>';

// READ THE XML AND CREATE AN OBJECT
$url = 'http://blog.strategos.com/innovaro_leading_innovati/rss.xml';
$xml = file_get_contents($url);
$obj = SimpleXML_Load_String($xml);

// ACTIVATE THIS LINE TO SEE THE PROPERTIES OF THE OBJECT
// var_dump($obj);

// EXTRACT SOME DATA FROM THE OBJECT
foreach ($obj->channel as $c)
{
    echo PHP_EOL . $c->title;
    echo PHP_EOL . $c->description;
    echo PHP_EOL;

    foreach ($c->item as $i)
    {
        echo PHP_EOL . $i->pubDate;
        echo PHP_EOL . $i->title;
        echo PHP_EOL;
    }
}

Open in new window

Please post back with any specific questions, thanks. ~Ray
0
 
kennyuAuthor Commented:
Hi Ray,

Thanks for the quick response.  I actually did fool around with similar code and got the same results as you, but is there a way for the code not to truncate the [description] field?  I'd like to capture the full text of this field.  When I dumped the $obj value you had it truncated it as well.

Thanks!
Kenny
0
 
Beverley PortlockCommented:
Ray's code does not truncate the description - at least not when I ran it. Or do you mean that you want the OTHER description field included as well (see attached snippet)

<?php // RAY_temp_kennyu.php
error_reporting(E_ALL);
echo '<pre>';

// READ THE XML AND CREATE AN OBJECT
$url = 'http://blog.strategos.com/innovaro_leading_innovati/rss.xml';
$xml = file_get_contents($url);
$obj = SimpleXML_Load_String($xml);

//var_dump($xml);
print_r( $obj );

// ACTIVATE THIS LINE TO SEE THE PROPERTIES OF THE OBJECT
// var_dump($obj);

// EXTRACT SOME DATA FROM THE OBJECT
foreach ($obj->channel as $c)
{
    echo PHP_EOL .  $c->title;
    echo PHP_EOL .  $c->description;
    echo PHP_EOL;

    foreach ($c->item as $i)
    {
        echo PHP_EOL . $i->pubDate;
        echo PHP_EOL . $i->title;
        echo PHP_EOL . $i->description;      // <--- this line added
        
        echo PHP_EOL;
    }
}

Open in new window

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Ray PaseurCommented:
...dumped the $obj value you had it truncated it...
I relied on this XML for the RSS feed:
http://blog.strategos.com/innovaro_leading_innovati/rss.xml

There is no limit on the length of strings in PHP and the code that I posted did not truncate anything that was present in the RSS feed.  If you want truncation, substr() or wordwrap() are PHP functions that can help with that.  If something appears to be truncated, it is probably because it was truncated in the original XML string.  Publishers do that sometimes to keep the size of the XML document down to a manageable level.  After all, it's intended to be a tease that gets you to read a fuller document that you will access via the links in the RSS.

More on RSS is available here.
http://cyber.law.harvard.edu/rss/rss.html

The <item> elements are title, link, description, guid, pubDate.  I did not find "author" in the document, but I didn't look very closely.  

If you wanted to get the original document from the link, that would be no problem - just use CURL or file_get_contents() to read from the resource and use var_dump() to see what you get back!

HTH, ~Ray
0
 
kennyuAuthor Commented:
I actually did use CURL in the initial code (below) to see what was coming back and you'll see that it will dump the full description as well as the truncated one before that.  Is there any way to parse the full description out with SimpleXML_Load_String($xml)?

<?php

$xml_feed_url = 'http://blog.strategos.com/innovaro_leading_innovati/rss.xml';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_feed_url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml = curl_exec($ch);
curl_close($ch);

var_dump ($xml);

?>

Open in new window


Thanks!
Kenny
0
 
kennyuAuthor Commented:
Forgot to mention, you're correct there is no [author] tag.  My bad.

Kenny
0
 
Ray PaseurCommented:
OK, I think I understand the issue a little better now.  The "description" is a term of art in RSS.  The data you're looking for appears to be in a tag that is not part of RSS.  I think what you want is enclosed in the <content:encoded> tags.

See if this makes more sense:

<?php // RAY_temp_kennyu.php
error_reporting(E_ALL);
echo '<pre>';

// READ THE XML AND CREATE AN OBJECT
$url = 'http://blog.strategos.com/innovaro_leading_innovati/rss.xml';
$xml = file_get_contents($url);

// ACTIVATE THIS TO SHOW THE XML
// echo htmlentities($xml);

// CORRECT THE XML TO WORK WITH SIMPLEXML()
$new = mungxml($xml);

// MAKE AN OBJECT
$obj = SimpleXML_Load_String($new);

// ACTIVATE THIS LINE TO SEE THE PROPERTIES OF THE OBJECT
// var_dump($obj);

// EXTRACT SOME DATA FROM THE OBJECT
foreach ($obj->channel as $c)
{
    echo PHP_EOL . "<h2>$c->title</h2>";
    echo PHP_EOL . $c->description;
    echo PHP_EOL;

    foreach ($c->item as $i)
    {
        echo PHP_EOL . $i->pubDate;
        echo PHP_EOL . "<b>$i->title</b>";
        echo PHP_EOL . $i->description;
        echo PHP_EOL . htmlentities($i->content_encoded);
        echo PHP_EOL;
    }
}

// FUNCTION TO MUNG THE XML
function mungXML($xml)
{
    // A REGULAR EXPRESSION TO MUNG THE XML
	$rgx
	= '#'           // REGEX DELIMITER
	. '('           // GROUP PATTERN 1
	. '\<'          // LOCATE A LEFT WICKET
	. '/{0,1}'      // MAYBE FOLLOWED BY A SLASH
	. '.*?'         // ANYTHING OR NOTHING
	. ')'           // END GROUP PATTERN
	. '('           // GROUP PATTERN 2
	. ':{1}'        // A COLON (EXACTLY ONE)
	. ')'           // END GROUP PATTERN
	. '#'           // REGEX DELIMITER
	;
	// INSERT THE UNDERSCORE INTO THE TAG NAME
	$rep
	= '$1'          // BACKREFERENCE TO GROUP 1
	. '_'           // LITERAL UNDERSCORE IN PLACE OF GROUP 2
	;
	// PERFORM THE REPLACEMENT
	return preg_replace($rgx, $rep, $xml);
}

Open in new window

0
 
kennyuAuthor Commented:
Sorry for the confusion but this is EXACTLY what I was looking for :)

THANK YOU!
Kenny
0
 
Ray PaseurCommented:
Thanks for the points - it's a great question, ~Ray
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now