Logon to Web Site and Download File Programmatically using Python, urllib2 module

Posted on 2014-08-05
Last Modified: 2014-08-08
Wish to logon to web site to programmatically download data.
Looking for assistance with Python and the urllib2 module.

At this page:
there is a Logon button which uses a GET (not a Post) referring to:
which Redirects to:

I'm hoping to understand how to navigate through web pages to eventually logon with my account.  
I've tried many variations of the below code without success.
Need help understand urllib2 to navigate to page with "I Agree" button and then to the actual page to enter username and password.
mport urllib
import urllib2
import cookielib

#cookie storage
cj = cookielib.CookieJar()
opener = urllib2.build_opener(

#### First page
url = ''

request = urllib2.Request(url)

response = urllib2.urlopen(request)

html =

# Print to screen
print html 

Open in new window

Appreciate help ...  Thanks !
Question by:DoveTails
    LVL 15

    Expert Comment

    by:Walter Ritzel
    You would need 2 things:
    1) create a code similar to yours to submit the information for the logon page;
    2) make sure that a session is being maintained;
    3) call the download url, using the code you have showed.

    It is not a question of navigating through pages, but a matter of generate the session that you need with the logon page and then call the direct download url.
    LVL 5

    Author Comment

    Thanks for the response Walter.
    My thinking behind navigating through the first few pages with buttons specifying "Logon" and "I Agree" is to acquire the necessary cookies.  If I attempt to navigate directly to the Authentication Page, I am directed back to the main Home page with the "Logon" button (basically back to page 1).

    I'm assuming by pressing "I Agree" in a standard browser a cookie is set which lasts for that session.

    Hopefully the code I have worked on for a Post with the logon information will work, but I cannot programmatically get to the logon page and my guess for that is because I do not yet have the "I Agree" cookie.

    Any thoughts ?
    LVL 15

    Accepted Solution

    In this case, instead of using urllib2, you may be interested on using mechanize.

    It uses urllib under the covers, but maybe the piece of functionality you are looking for is already implemented there.
    LVL 16

    Assisted Solution

    Another library, that I can strongly recommend for any access to web servers, that are NOT using javascript is

    For any web page containing loads of javascript it might be necesseray to use a real web browser and to automate it.

    you could use Selenium, which allows you to automate a browser

    Please reply if you're interested in either requests or selenium.

    I personally gave up on using urllib in my code. I think its only advantage is, that it is a standard python module, but coding with it just looks clumsy to me.
    LVL 39

    Assisted Solution

    maybe, probably, you are accomplishing the same thing as what curl (curllib) already does.
    Check for more info on curl.

    It has support for http, https, telnet, ftp, ftps,  etc. etc. and using the command line interface you can script  access to webservers including logins etc. excluding JS execution. (no ajax...). It can handle cookies that way too. the library version is more flexible. still no ajax though.... unless you have JS interpreter built into your software.
    LVL 5

    Author Closing Comment

    Thank you.  More options than I expected.
    Appreciate your input !

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    Suggested Solutions

    This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
    The purpose of this article is to demonstrate how we can use conditional statements using Python.
    Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
    The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

    733 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    24 Experts available now in Live!

    Get 1:1 Help Now