RSS html rip

Hi

I want to make an RSS feed.  What I want it to do is first log into a site, get values from a specific HTML page, and then put these values into the feed.  Using PHP, Perl, Ruby or Java, does anyone know a way that I can fetch a HTML page from an external web site (remembering that it will need to provide some authentication first)?

Specifically what I'm looking for is a way to fetch a page behind authentication automatically, using my username and password for the site, and to basically save that page to my server so that I can work on pulling out the values I want.
LVL 2
Mr_LenehanAsked:
Who is Participating?
 
dasmaerConnect With a Mentor Commented:
using cookies has nothing to do with HTTP auth.

Try this, forcing wget to use the cookie you want. Specify cookies off and then the cookie name and value (see: http://www.delorie.com/gnu/docs/wget/wget_9.html).  

Example: wget --cookies=off --header="Cookie: LOGIN=username:passwordhash" http://www.yoursite.com

where password hash is either a plain text password or the hashed password in your cookie.
0
 
Mr_LenehanAuthor Commented:
also, it could use any unix/linux program such as wget as this could be set as this could be activated from within the coded solution.
0
 
dasmaerCommented:
wget can do it...
   
   wget --help

you'll see it can do HTTP authentication, and also cookie loading - so you'd need to find your cookie placed by the site at log in.
0
 
Mr_LenehanAuthor Commented:
I tried that... I pointed wget at the location of the cookie and it didn't work! (the site doesn't use HTTP authentication)

Maybe there's a different way? programmatically?
0
 
Mr_LenehanAuthor Commented:
Result! Using php shell_exec I can get this to fetch my page (well, not my page but that page I want). Cheers.
0
All Courses

From novice to tech pro — start learning today.