Screen Scraping Password Protected Sites
Posted on 2006-06-06
I need to do some screen scraping on a password protected site for which I have a valid login. I create the URLConnection etc, and perform the POST operation for the URL in question, but the HTML I get back is the contents of the login page to which I have been redirected. Setting up a PasswordAuthenticator using Authenticator.setDefault() did not help at all. I am just curious what my general strategy for making this work should be? I'm guessing I might need to actually perform the login, trap some cookies or something, remember them and then use them in requesting the resource in question. Does this sound right? I'm sure it is different depending on site, so any sort of general resource explaining how to do this would be perfect.