Web scraping on a secure site

Posted on 2009-04-27
Last Modified: 2012-05-06
I need to get certain data from a site, which is secured with a login form, which contains a captcha. I know it's rather pointless to try to hack the captcha, but I wonder if there are other ways to bypass this. For starters, I wonder if it's somehow possible to use the webserver's web browser and browse to this site, enter username, password, and captcha and log in and finally transfer the cookies/session that is created in the browser to the ASP.NET engine, which gives it access to scrape this site?

Suggestions are welcome.
Question by:Buginator
    LVL 23

    Expert Comment

    You could have your website act as a go between.

    Have it scrape the login page with the capture and display it for you to enter on its own page along with the login details. Then it posts your entry to the real website for you.

    You will have to make sure your tracks cookies in its requests so that it establishes a session with the real website.
    LVL 5

    Author Comment

    Thanks for your reply,

    As this this a relatively new field for me, could you show me some example code or post some links that contains good info about this? I believe the most challenging part is the go between part, posting the form information, tracking the cookies and the establishment with the session.
    LVL 23

    Accepted Solution

    You need to learn about using the HttpRequest objects and using it to POST data while supporting cookies.

    Make a request to the login page. (GET, cookies supported)

    Scrape the response to find th captcha image.

    Display your own web form with the captcha image requesting login and a captcha response.

    Use the login info to POST a request to the real login page ensuring you use the same cookie collection as before.

    All other request must use the same Cookies.

    This article ends in a cookie based post to do a login


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    A Change in PHP Behavior with Session Write Short Circuit ( (Winter 2014)** With the release of PHP 5.6 the session handler changed in a way that many think should be considered a bug.  See the note …
    Developer tools in browsers have been around for a while, yet they are still heavily underused by developers. Developers still fix html or CSS then refresh page to see effect, or they put alert or debugger in JavaScript and then try again and again …
    Use Wufoo, an online form creation tool, to make powerful forms. Learn how to choose which pages of your form are visible to your users based on their inputs. The page rules feature provides you with an opportunity to create if:then statements for y…
    Learn how to set-up custom confirmation messages to users who complete your Wufoo form. Include inputs from fields in your form, webpage redirects, and more with Wufoo’s confirmation options.

    737 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now