[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Web scraping on a secure site

Posted on 2009-04-27
3
Medium Priority
?
875 Views
Last Modified: 2012-05-06
I need to get certain data from a site, which is secured with a login form, which contains a captcha. I know it's rather pointless to try to hack the captcha, but I wonder if there are other ways to bypass this. For starters, I wonder if it's somehow possible to use the webserver's web browser and browse to this site, enter username, password, and captcha and log in and finally transfer the cookies/session that is created in the browser to the ASP.NET engine, which gives it access to scrape this site?

Suggestions are welcome.
0
Comment
Question by:Buginator
  • 2
3 Comments
 
LVL 23

Expert Comment

by:Tony McCreath
ID: 24249261
You could have your website act as a go between.

Have it scrape the login page with the capture and display it for you to enter on its own page along with the login details. Then it posts your entry to the real website for you.

You will have to make sure your asp.net tracks cookies in its requests so that it establishes a session with the real website.
0
 
LVL 5

Author Comment

by:Buginator
ID: 24252482
Thanks for your reply,

As this this a relatively new field for me, could you show me some example code or post some links that contains good info about this? I believe the most challenging part is the go between part, posting the form information, tracking the cookies and the establishment with the session.
0
 
LVL 23

Accepted Solution

by:
Tony McCreath earned 2000 total points
ID: 24261198
You need to learn about using the HttpRequest objects and using it to POST data while supporting cookies.

Make a request to the login page. (GET, cookies supported)

Scrape the response to find th captcha image.

Display your own web form with the captcha image requesting login and a captcha response.

Use the login info to POST a request to the real login page ensuring you use the same cookie collection as before.

All other request must use the same Cookies.

This article ends in a cookie based post to do a login
http://odetocode.com/articles/162.aspx

0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developer tools in browsers have been around for a while, yet they are still heavily underused by developers. Developers still fix html or CSS then refresh page to see effect, or they put alert or debugger in JavaScript and then try again and again …
High user turnover can cause old/redundant user data to consume valuable space. UserResourceCleanup was developed to address this by automatically deleting user folders when the user account is deleted.
Wufoo.com provides powerful tools for surveying targeted groups, and utilizing data from completed surveys to find trends, discover areas of demand or customer expectation, and make business decisions on products or services.
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to selectively show certain fields based on user input using rules to gather relevant information and data from your forms. The rules feature provides you with an opportunity…
Suggested Courses

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question