Need a script to automate a file download from a site which requires a login

I am trying to automate my process of downloading a weekly .csv file from a site.  The url is "https://www.sitename.com/directory/filename.csv".  By following the link there is a windows prompt for a username and password.  Any ideas?
speede1Asked:
Who is Participating?
 
Adam314Connect With a Mentor Commented:

#!/usr/bin/perl
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
$mech->add_header("Authorization" => "Basic xxxxxxxxxxxxxxxx");
$mech->get('https://www.sitename.com/directory/filename.csv');
 
print "status=" . $mech->status . "\n";
print "content=" . $mech->content . "\n";

Open in new window

0
 
Adam314Commented:
If it is basic authentication, you could use the credentials method from LWP::UserAgent:
http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP/UserAgent.pm
0
 
speede1Author Commented:
I am a newbie Adam,  I am trying to download a file from a site via  a hyperlink.  When I click on the hyperlink,  the site prompts me to enter my username and password via a windows popup.
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
Adam314Commented:
Try this code.  You will have to install WWW::Mechanize, if it isn't already installed.
On windows: at a prompt, type: ppm install WWW-Mechanize
On anything else: at a prompt as root: perl -MCPAN -e 'install WWW::Mechanize'

#!/usr/bin/perl
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
#NOTE: put your username and password here
$mech->credentials( $username, $password );
 
$mech->get('https://www.sitename.com/directory/filename.csv');
die "Unsuccessful: status=" . $mech->status . "\n" unless $mech->success;
 
open(my $out, ">filename.txt") or die "Output file: $!\n";
print $out $mech->content;
close($out);
    

Open in new window

0
 
speede1Author Commented:
Getting unsuccessful - 401
0
 
Adam314Commented:
Status 401 means not authorized.  Did you enter the correct username and password?

Try this, it will display the content no matter what the status.

#!/usr/bin/perl
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
#NOTE: put your username and password here
$mech->credentials( $username, $password );
 
$mech->get('https://www.sitename.com/directory/filename.csv');
print "status=" . $mech->status . "\n";
print "content=" . $mech->content . "\n";

Open in new window

0
 
speede1Author Commented:
Tried that and the following is what has been received:

status=401
content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
<hr>
<address>Apache/2.2.6 (Debian) mod_ssl/2.2.6 OpenSSL/0.9.8g mod_apreq2-20051231/
2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address>
</body></html>

I have tested the user name and password by logging in through the browser.
0
 
Adam314Commented:
I'm guessing the method of authentication used by the server is different than what the credentials method provides.

Could use use firefox with LiveHTTPHeaders when you go to the website to see what headers are provided?
0
 
speede1Author Commented:
https://www.world-check.com/portal/Downloads/world-check.csv

GET /portal/Downloads/world-check.csv HTTP/1.1
Host: www.world-check.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 401 Authorization Required
Server: nginx/0.5.35
Date: Thu, 12 Jun 2008 17:45:29 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: close
WWW-Authenticate: Basic realm="WorldCheck auth"
Content-Length: 556
0
 
Adam314Commented:
Are you able to get the file with firefox?  Those headers look like you got an error message.  Can you post the headers for when it is successful?
0
 
speede1Author Commented:
https://www.world-check.com/portal/Downloads/world-check.csv

GET /portal/Downloads/world-check.csv HTTP/1.1
Host: www.world-check.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: test_cookie=1

HTTP/1.x 401 Authorization Required
Server: nginx/0.5.35
Date: Tue, 17 Jun 2008 15:30:06 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: close
WWW-Authenticate: Basic realm="WorldCheck auth"
Content-Length: 556
----------------------------------------------------------
https://www.world-check.com/portal/Downloads/world-check.csv

GET /portal/Downloads/world-check.csv HTTP/1.1
Host: www.world-check.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: test_cookie=1
Authorization: Basic xxxxxxxxxxxxxxxx

HTTP/1.x 200 OK
Server: nginx/0.5.35
Date: Tue, 17 Jun 2008 15:30:19 GMT
Content-Type: text/csv
Content-Length: 507135096
Last-Modified: Tue, 17 Jun 2008 15:29:42 GMT
Connection: close
Accept-Ranges: bytes
0
 
speede1Author Commented:
status=401
content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
<hr>
<address>Apache/2.2.6 (Debian) mod_ssl/2.2.6 OpenSSL/0.9.8g mod_apreq2-20051231/
2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address>
</body></html>
0
 
Adam314Commented:
Did you replace the xxxxx in the script with the password?  Was the "Basic xxxxxxx" in the headers actually what was posted, or was there a password there?
0
 
speede1Author Commented:
It was string of letters which was probably an encrypted version of the username and password
0
 
Adam314Commented:
Did you try using that same string in the perl script?
0
 
speede1Author Commented:
Yes, which is when I got the 401 message
0
 
Adam314Commented:
I'm not sure then..... maybe you could use wget or curl
http://www.gnu.org/software/wget/
http://curl.haxx.se/

I'm not sure if either will work with authentication though.
0
 
speede1Author Commented:
is there any code for wget
0
 
Adam314Commented:
The manual for wget is here:
    http://www.gnu.org/software/wget/manual/

There are many examples, with one having a username/password towards the bottom of this page:
    http://www.gnu.org/software/wget/manual/html_node/Advanced-Usage.html#Advanced-Usage


To call it from perl, you would use:
    my $returncode=system("wget ....");  #then check $returncode for success/failure
Or:
    my $output=`wget .....`;     #then check $output and $? for success/failure
0
 
speede1Author Commented:
I used the following and I think it is working:


#!/usr/bin/perl
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
$mech->get('https://user:password@www.sitename.com/directory/filename.csv');
 
print "status=" . $mech->status . "\n";
print "content=" . $mech->content . "\n";



i added the user:password string to the address.  How can I direct this now to a specific directory on my machine also with a log file
0
 
speede1Author Commented:
need to pipe this to a file because I ran it from a command prompt and the data appeared in the dos window
0
 
Adam314Connect With a Mentor Commented:
You could have the script write it to a file.  There are several methods:
1)
    #save the csv file to $filename
    $mech->save_content($filename);

2)
    open(my $out, ">$filename") or die "Could not create output: $!\n";
    print $out $mech->content;
    close($out);

Or you could do the redirect from the command prompt:
    perl yourscript.pl > some/file.csv
    yourscript.pl > some/file.csv
0
 
speede1Author Commented:
The final code is:

#!/usr/bin/perl
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
$mech->get('https://xxxx:password@www.sitename/file.csv');

#save the csv file to $filename
$mech->save_content($test.csv);

print "status=" . $mech->status . "\n";
print "content=" . $mech->content . "\n";
0
 
speede1Author Commented:
Adam,

The get command actually opens the file in the dos window, is there a command which just downloads the file without opening it.  I tried using the script with a 400MB file and it doesn't work.  Would like to modify the script to just do a straight download
0
 
Adam314Commented:
It's not the get command that opens the dos window, it is perl itself.  If you don't want the window, you can use wperl instead of perl.  Several ways to do this:

In the command for the shortcut you are clicking, change it to:
    c:\perl\bin\wperl.exe "c:\path\to\your\script.pl"
You will not need double-quotes (like shown) if the path does not contain spaces.

Or you could associate the extension .wpl with wperl.exe, and rename the script from something.pl to something.wpl.
0
 
speede1Author Commented:
Ok, Adam I have changed the script to use the wperl command, now is there any way to implement a status window or log file to see whats going on with the download, because I am trying to download a 400MB File.  It takes an hour if i do it manually, but when I ran the script yesterday the dos windows was open for about 7 hours and when it finally closed there wasn't any file.
0
 
Adam314Commented:
The saved file will be in the current directory of the script.  If you want it in a particular directory, add this to the script:
    chdir('/path/to/where/you/want/to/save');

You might be able to use the LWP::Simple module to save the file
    use LWP::Simple;
    getstore('https://xxxx:password@www.sitename.com/file.csv', '/path/to/file.csv') ;
Then check the size of /path/to/file.csv as the script is running.  I'm not sure if this function will write data as it is downloaded, or write it all when it's finished.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.