RSS html rip

Posted on 2006-11-28
Last Modified: 2012-06-27

I want to make an RSS feed.  What I want it to do is first log into a site, get values from a specific HTML page, and then put these values into the feed.  Using PHP, Perl, Ruby or Java, does anyone know a way that I can fetch a HTML page from an external web site (remembering that it will need to provide some authentication first)?

Specifically what I'm looking for is a way to fetch a page behind authentication automatically, using my username and password for the site, and to basically save that page to my server so that I can work on pulling out the values I want.
Question by:Mr_Lenehan
  • 3
  • 2

Author Comment

ID: 18032935
also, it could use any unix/linux program such as wget as this could be set as this could be activated from within the coded solution.

Expert Comment

ID: 18033040
wget can do it...
   wget --help

you'll see it can do HTTP authentication, and also cookie loading - so you'd need to find your cookie placed by the site at log in.

Author Comment

ID: 18033144
I tried that... I pointed wget at the location of the cookie and it didn't work! (the site doesn't use HTTP authentication)

Maybe there's a different way? programmatically?

Accepted Solution

dasmaer earned 500 total points
ID: 18033157
using cookies has nothing to do with HTTP auth.

Try this, forcing wget to use the cookie you want. Specify cookies off and then the cookie name and value (see:  

Example: wget --cookies=off --header="Cookie: LOGIN=username:passwordhash"

where password hash is either a plain text password or the hashed password in your cookie.

Author Comment

ID: 18033172
Result! Using php shell_exec I can get this to fetch my page (well, not my page but that page I want). Cheers.

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Perl 101 11 72
wordpress limitations 4 103
site launch date and last modified date 3 80
Randomize in Owl Carousel v1.3.2 6 19
Browsers only know CSS so your awesome SASS code needs to be translated into normal CSS. Here I'll try to explain what you should aim for in order to take full advantage of SASS.
JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now