Solved

Need a script to automate a file download from a site which requires a login

Posted on 2008-06-11
27
918 Views
Last Modified: 2008-07-03
I am trying to automate my process of downloading a weekly .csv file from a site.  The url is "https://www.sitename.com/directory/filename.csv".  By following the link there is a windows prompt for a username and password.  Any ideas?
0
Comment
Question by:speede1
  • 14
  • 13
27 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 21763175
If it is basic authentication, you could use the credentials method from LWP::UserAgent:
http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP/UserAgent.pm
0
 

Author Comment

by:speede1
ID: 21764185
I am a newbie Adam,  I am trying to download a file from a site via  a hyperlink.  When I click on the hyperlink,  the site prompts me to enter my username and password via a windows popup.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21764313
Try this code.  You will have to install WWW::Mechanize, if it isn't already installed.
On windows: at a prompt, type: ppm install WWW-Mechanize
On anything else: at a prompt as root: perl -MCPAN -e 'install WWW::Mechanize'

#!/usr/bin/perl

use WWW::Mechanize;
 

my $mech = WWW::Mechanize->new();
 

#NOTE: put your username and password here

$mech->credentials( $username, $password );
 

$mech->get('https://www.sitename.com/directory/filename.csv');

die "Unsuccessful: status=" . $mech->status . "\n" unless $mech->success;
 

open(my $out, ">filename.txt") or die "Output file: $!\n";

print $out $mech->content;

close($out);

    

Open in new window

0
 

Author Comment

by:speede1
ID: 21764569
Getting unsuccessful - 401
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21766108
Status 401 means not authorized.  Did you enter the correct username and password?

Try this, it will display the content no matter what the status.

#!/usr/bin/perl

use WWW::Mechanize;

 

my $mech = WWW::Mechanize->new();

 

#NOTE: put your username and password here

$mech->credentials( $username, $password );

 

$mech->get('https://www.sitename.com/directory/filename.csv');

print "status=" . $mech->status . "\n";

print "content=" . $mech->content . "\n";

Open in new window

0
 

Author Comment

by:speede1
ID: 21770621
Tried that and the following is what has been received:

status=401
content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
<hr>
<address>Apache/2.2.6 (Debian) mod_ssl/2.2.6 OpenSSL/0.9.8g mod_apreq2-20051231/
2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address>
</body></html>

I have tested the user name and password by logging in through the browser.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21771629
I'm guessing the method of authentication used by the server is different than what the credentials method provides.

Could use use firefox with LiveHTTPHeaders when you go to the website to see what headers are provided?
0
 

Author Comment

by:speede1
ID: 21771899
https://www.world-check.com/portal/Downloads/world-check.csv

GET /portal/Downloads/world-check.csv HTTP/1.1
Host: www.world-check.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 401 Authorization Required
Server: nginx/0.5.35
Date: Thu, 12 Jun 2008 17:45:29 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: close
WWW-Authenticate: Basic realm="WorldCheck auth"
Content-Length: 556
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21804109
Are you able to get the file with firefox?  Those headers look like you got an error message.  Can you post the headers for when it is successful?
0
 

Author Comment

by:speede1
ID: 21804182
https://www.world-check.com/portal/Downloads/world-check.csv

GET /portal/Downloads/world-check.csv HTTP/1.1
Host: www.world-check.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: test_cookie=1

HTTP/1.x 401 Authorization Required
Server: nginx/0.5.35
Date: Tue, 17 Jun 2008 15:30:06 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: close
WWW-Authenticate: Basic realm="WorldCheck auth"
Content-Length: 556
----------------------------------------------------------
https://www.world-check.com/portal/Downloads/world-check.csv

GET /portal/Downloads/world-check.csv HTTP/1.1
Host: www.world-check.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: test_cookie=1
Authorization: Basic xxxxxxxxxxxxxxxx

HTTP/1.x 200 OK
Server: nginx/0.5.35
Date: Tue, 17 Jun 2008 15:30:19 GMT
Content-Type: text/csv
Content-Length: 507135096
Last-Modified: Tue, 17 Jun 2008 15:29:42 GMT
Connection: close
Accept-Ranges: bytes
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 350 total points
ID: 21804400

#!/usr/bin/perl

use WWW::Mechanize;

 

my $mech = WWW::Mechanize->new();

 

$mech->add_header("Authorization" => "Basic xxxxxxxxxxxxxxxx");

$mech->get('https://www.sitename.com/directory/filename.csv');
 

print "status=" . $mech->status . "\n";

print "content=" . $mech->content . "\n";

Open in new window

0
 

Author Comment

by:speede1
ID: 21805228
status=401
content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
<hr>
<address>Apache/2.2.6 (Debian) mod_ssl/2.2.6 OpenSSL/0.9.8g mod_apreq2-20051231/
2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address>
</body></html>
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21805666
Did you replace the xxxxx in the script with the password?  Was the "Basic xxxxxxx" in the headers actually what was posted, or was there a password there?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:speede1
ID: 21805796
It was string of letters which was probably an encrypted version of the username and password
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21806073
Did you try using that same string in the perl script?
0
 

Author Comment

by:speede1
ID: 21806517
Yes, which is when I got the 401 message
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21806602
I'm not sure then..... maybe you could use wget or curl
http://www.gnu.org/software/wget/
http://curl.haxx.se/

I'm not sure if either will work with authentication though.
0
 

Author Comment

by:speede1
ID: 21816079
is there any code for wget
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21816193
The manual for wget is here:
    http://www.gnu.org/software/wget/manual/

There are many examples, with one having a username/password towards the bottom of this page:
    http://www.gnu.org/software/wget/manual/html_node/Advanced-Usage.html#Advanced-Usage


To call it from perl, you would use:
    my $returncode=system("wget ....");  #then check $returncode for success/failure
Or:
    my $output=`wget .....`;     #then check $output and $? for success/failure
0
 

Author Comment

by:speede1
ID: 21816326
I used the following and I think it is working:


#!/usr/bin/perl
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
$mech->get('https://user:password@www.sitename.com/directory/filename.csv');
 
print "status=" . $mech->status . "\n";
print "content=" . $mech->content . "\n";



i added the user:password string to the address.  How can I direct this now to a specific directory on my machine also with a log file
0
 

Author Comment

by:speede1
ID: 21816405
need to pipe this to a file because I ran it from a command prompt and the data appeared in the dos window
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 350 total points
ID: 21816461
You could have the script write it to a file.  There are several methods:
1)
    #save the csv file to $filename
    $mech->save_content($filename);

2)
    open(my $out, ">$filename") or die "Could not create output: $!\n";
    print $out $mech->content;
    close($out);

Or you could do the redirect from the command prompt:
    perl yourscript.pl > some/file.csv
    yourscript.pl > some/file.csv
0
 

Author Comment

by:speede1
ID: 21817514
The final code is:

#!/usr/bin/perl
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
$mech->get('https://xxxx:password@www.sitename/file.csv');

#save the csv file to $filename
$mech->save_content($test.csv);

print "status=" . $mech->status . "\n";
print "content=" . $mech->content . "\n";
0
 

Author Comment

by:speede1
ID: 21832294
Adam,

The get command actually opens the file in the dos window, is there a command which just downloads the file without opening it.  I tried using the script with a 400MB file and it doesn't work.  Would like to modify the script to just do a straight download
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21832400
It's not the get command that opens the dos window, it is perl itself.  If you don't want the window, you can use wperl instead of perl.  Several ways to do this:

In the command for the shortcut you are clicking, change it to:
    c:\perl\bin\wperl.exe "c:\path\to\your\script.pl"
You will not need double-quotes (like shown) if the path does not contain spaces.

Or you could associate the extension .wpl with wperl.exe, and rename the script from something.pl to something.wpl.
0
 

Author Comment

by:speede1
ID: 21833150
Ok, Adam I have changed the script to use the wperl command, now is there any way to implement a status window or log file to see whats going on with the download, because I am trying to download a 400MB File.  It takes an hour if i do it manually, but when I ran the script yesterday the dos windows was open for about 7 hours and when it finally closed there wasn't any file.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 21833344
The saved file will be in the current directory of the script.  If you want it in a particular directory, add this to the script:
    chdir('/path/to/where/you/want/to/save');

You might be able to use the LWP::Simple module to save the file
    use LWP::Simple;
    getstore('https://xxxx:password@www.sitename.com/file.csv', '/path/to/file.csv') ;
Then check the size of /path/to/file.csv as the script is running.  I'm not sure if this function will write data as it is downloaded, or write it all when it's finished.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Foreword (May 2015) This web page has appeared at Google.  It's definitely worth considering! https://www.google.com/about/careers/students/guide-to-technical-development.html How to Know You are Making a Difference at EE In August, 2013, one …
New Relic: Our company recently started researching several products to figure out what were the best ways for us to increase our web page speed and to quickly identify performance problems that we may be having. One of the products we evaluated wa…
This video teaches users how to migrate an existing Wordpress website to a new domain.
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to selectively show certain fields based on user input using rules to gather relevant information and data from your forms. The rules feature provides you with an opportunity…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now