Grab the search results from a website using curl.

Hello everybody:

How would I use curl to grab the search results from a website like http://www.feed24.com/. I would like a command line script if possible. I want just the returned results (the text), not the html. Also, I would like to get each news item one at a time, not all as one html page. That way, I could write individual news items to a file or post them to a blog. Also is there a way to automatically follow the link to the next page of the search results to continue the download? Thanks.
jmcnealy1Asked:
Who is Participating?
 
designbaiCommented:
<?
$URL = "http://www.feed24.com/";
$string = `/usr/bin/curl $URL`;
//echo ($string);

// strip only anchor tags
$data = strip_tags($string,"<a>");

// do a regular expression search to match the news items. result will be stored in $return
$data = preg_match_all("/<a[^>].*?item_id=.*?>(.*?)<\/a>/si",$data,$return);

// here is your item. chioce to store into db or flat file. it has only news title
for ($i=0; $i<sizeof($return[1]); $i++) {
      echo $i.". ".$return[1][$i]."<br>";
}

thanks.
0
 
designbaiCommented:
try using this to grab a page using curl

Before using curl, it must be installed.

$URL = "http://www.feed24.com/";
$string = `/usr/bin/curl $URL`; // note ' not single quote it is tilde ` previous key to 1
echo ($string);

$string variable contains the whole webpage.

You have to write a code parse the required news there after.

thanks.
0
 
Marcus BointonCommented:
You can get results from there using RSS (since that's what the site is all about). RSS is far more reliable easier to handle than attempting to parse a rendered page that could change at any time. All you need to do is URL encode your search term and append it to this URL:

http://www.feed24.com/search.xml/

so a search for 'experts exchange' would be:

http://www.feed24.com/search.xml/experts%20exchange

You don't need to use CURL at all - there are plenty of RSS PHP clients that make all this trivial, in particular:

http://magpierss.sourceforge.net/
http://lastrss.webdot.cz/

Once you have the RSS feed contents, you can extract as few or as many as you like, and you can choose to parse their content for links (note that if you simply remove HTML tags, all your links will disappear too). You might like to use an HTML parser to strip HTML formatting more carefully:

http://php-html.sourceforge.net/
0
 
designbaiCommented:
I do agree that HTML parser is always not trustable. Because if there is a change in the page, then we have to code it again.

But in the case of RSS, we do not need to worry. The structures are standard. We can play with it.
0
 
jmcnealy1Author Commented:
Thanks alot for the help. The point about just using rss as it is was a good one. I'll work on that in the future.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.