Logging in to Yahoo! From a Perl Script?

Posted on 2006-05-18
Last Modified: 2010-04-06
I'd like a simple Perl script that will automatically log in to a
Yahoo page (like and output the page to a
text file. I've tried some of the scripts online and I can't seem to
get them to work.
Question by:rhinez0rz
    1 Comment
    LVL 2

    Accepted Solution

    I'm not sure, but it sounds like you don't really need anything to log in exactly, just download a file using HTTP.  There's a program you can run (available for Windows and Linux) called wget which does this, though not directly from Perl.  You use it from a DOS or Shell prompt like this:

    $ wget <url>


    $ wget

    This would save the text file as "somefile.txt" in your current working directory, which is probably something like what you're after.  (Above, the $ is the prompt; you don't type it, and in DOS you'll get the usual C:\> like prompt of course).  Any shell command can be called from Perl using, if I remember correctly:

    system("command here")

    if wget is installed on your path.  Wget takes various options to let you specify things like retry behaviour, output filename and so on.  -O (dash captial O) sets the output file.  So to download the URL you suggest and save it as temp.html, you could use:

    system("wget -O temp.html")

    Then your script could open the file and read it or do whatever else it wanted with it.  This is a not-too-pretty method, but it works and is dead simple, while allowing you to use all the features wget offers (like retrying and even spider-like link-following and incremental mirroring behaviour, which is very neat).

    On Unix-like systems, you probably already have wget.  For Win32 systems, probably a binary download is the easiest method.  You can get it from:

    If you actually want to log in, supplying a username and password, then the good news is that it can be done but the bad news is that it's not that easy and requires a bit of background information.

    When a browser logs in using the login form, it sends a request - either an HTTP GET request or an HTTP POST request - to the server.  This includes name-value pairs that are sent; for example, when logging into something one will probably be username=something and the other will be password=somethingelse (though the names, contents, and overall strategy may well differ).  If you see these in the addressbar when using the site, they're being sent as GET requests.  If you don't, they'll be POST requests, where the browser sends the information along with the request which points to the URL of something known as a server-side script or server-side application of some kind.

    When you want your program to be able to fake these post and get requests (although I suspect wget might well be able to do it for you) you probably need to do the HTTP stuff inside Perl.  This can be done with a library or module for the purpose; see the following reference for more information:

    The really hard part is in knowing what is actually posted by your browser so that you can have your program post the same sort of information.  Finding this out requires a certain amount of HTML knowledge.  I can give you a quick overview of how you find out here but you'll probably need to do more reading if this is what you're after (see - great site generally -  for some good introductory HTML and other programming information if you need it).

    Basically, the method is to go to the site you want your program to interact with using your browser and then use the 'view source' command a lot and pour over the source you see.  Usually you should be able to find the login form in there - starting with a <FORM> tag.  This tag includes an "action" attribute which will tell you the URL your program needs to interact with, and it will also specify a "method" (like GET or POST).  If you don't see method, the method is POST (the default).  If you don't see action (rarer), then the target is the URL you're looking at the source of - the same URL as the form itself.

    Right, great so far.  Now look over the <INPUT> tags.  In there somewhere will be the ones that take the ussername and password, and you will be able to get their names from here.  Remember if there are any funny hidden and other options, you will probably have to have your program send these too to make sure the server side code believes you're a browser.

    There are just a couple of other complications: first, the server side code will probably send you cookies and your program will often need to handle these in order to log in and usse features of a site that require it.  The library you're using to access HTTP information will probably provide a cookie mechanism of some kind - you just have to accept these cookies and send them back to the server with any future requests you send and the server will be happy.

    Second, the server might just not like the fact that you're not a real browser.  To fake it, you need to find out the user agent string that you want to impersonate and send it along in the HTTP headers.  Wget allows you to set the user agent string if you want to (see the wget documentation for details), and most HTTP libraries will also allow you to set arbitrary headers.  The user agent strings for well-known and therefore probably acceptable browsers to the server, if it cares, are well known and available online.  Most services will be happy to deal with you anyway, and most of those that distinguish between browsers are happy if you send a string that contains "Mozilla/4.0 Compatible" or contains "MSIE" or both.  If in doubt, - take a look for some sites that list valid user agent strings.  There are plenty such sites out there and you can probably make your program perfectly impersonate any browser you like with these.

    Most web services aren't that picky about the browser, though, as long as you get the arguments, method and usually cookie handling right.

    Hope this helps!

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Training Course: Android App Development

    This course will involve creating widgets, customize list view, grid view, spinners, etc. Creating applications using audio, video, and SQLite database. Ending with publication on Google Play.

    Article by: Matthew
    I am a very big proponent of technology compliance standards and strive to meet such criteria in all of my work. That includes my site, which is 100% XHTML 1.0 compliant as determined by the World Wide Web Consortium. https://www.matthewstevenkel…
    This article covers the basics of the Sass, which is a CSS extension language. You will learn about variables, mixins, and nesting.
    The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
    The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    11 Experts available now in Live!

    Get 1:1 Help Now