Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 301
  • Last Modified:

Grab the search results from a website using curl.

Hello everybody:

How would I use curl to grab the search results from a website like http://www.feed24.com/. I would like a command line script if possible. I want just the returned results (the text), not the html. Also, I would like to get each news item one at a time, not all as one html page. That way, I could write individual news items to a file or post them to a blog. Also is there a way to automatically follow the link to the next page of the search results to continue the download? Thanks.
0
jmcnealy1
Asked:
jmcnealy1
  • 3
2 Solutions
 
designbaiCommented:
try using this to grab a page using curl

Before using curl, it must be installed.

$URL = "http://www.feed24.com/";
$string = `/usr/bin/curl $URL`; // note ' not single quote it is tilde ` previous key to 1
echo ($string);

$string variable contains the whole webpage.

You have to write a code parse the required news there after.

thanks.
0
 
designbaiCommented:
<?
$URL = "http://www.feed24.com/";
$string = `/usr/bin/curl $URL`;
//echo ($string);

// strip only anchor tags
$data = strip_tags($string,"<a>");

// do a regular expression search to match the news items. result will be stored in $return
$data = preg_match_all("/<a[^>].*?item_id=.*?>(.*?)<\/a>/si",$data,$return);

// here is your item. chioce to store into db or flat file. it has only news title
for ($i=0; $i<sizeof($return[1]); $i++) {
      echo $i.". ".$return[1][$i]."<br>";
}

thanks.
0
 
Marcus BointonCommented:
You can get results from there using RSS (since that's what the site is all about). RSS is far more reliable easier to handle than attempting to parse a rendered page that could change at any time. All you need to do is URL encode your search term and append it to this URL:

http://www.feed24.com/search.xml/

so a search for 'experts exchange' would be:

http://www.feed24.com/search.xml/experts%20exchange

You don't need to use CURL at all - there are plenty of RSS PHP clients that make all this trivial, in particular:

http://magpierss.sourceforge.net/
http://lastrss.webdot.cz/

Once you have the RSS feed contents, you can extract as few or as many as you like, and you can choose to parse their content for links (note that if you simply remove HTML tags, all your links will disappear too). You might like to use an HTML parser to strip HTML formatting more carefully:

http://php-html.sourceforge.net/
0
 
designbaiCommented:
I do agree that HTML parser is always not trustable. Because if there is a change in the page, then we have to code it again.

But in the case of RSS, we do not need to worry. The structures are standard. We can play with it.
0
 
jmcnealy1Author Commented:
Thanks alot for the help. The point about just using rss as it is was a good one. I'll work on that in the future.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now