Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
Solved

# Need a script to automate a file download from a site which requires a login

Posted on 2008-06-11
Medium Priority
961 Views
I am trying to automate my process of downloading a weekly .csv file from a site.  The url is "https://www.sitename.com/directory/filename.csv".  By following the link there is a windows prompt for a username and password.  Any ideas?
0
Question by:speede1
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• Learn & ask questions
• 14
• 13

LVL 39

Expert Comment

ID: 21763175
If it is basic authentication, you could use the credentials method from LWP::UserAgent:
http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP/UserAgent.pm
0

Author Comment

ID: 21764185
I am a newbie Adam,  I am trying to download a file from a site via  a hyperlink.  When I click on the hyperlink,  the site prompts me to enter my username and password via a windows popup.
0

LVL 39

Expert Comment

ID: 21764313
Try this code.  You will have to install WWW::Mechanize, if it isn't already installed.
On windows: at a prompt, type: ppm install WWW-Mechanize
On anything else: at a prompt as root: perl -MCPAN -e 'install WWW::Mechanize'

#!/usr/bin/perl
use WWW::Mechanize;

my $mech = WWW::Mechanize->new(); #NOTE: put your username and password here$mech->credentials( $username,$password );

$mech->get('https://www.sitename.com/directory/filename.csv'); die "Unsuccessful: status=" .$mech->status . "\n" unless $mech->success; open(my$out, ">filename.txt") or die "Output file: $!\n"; print$out $mech->content; close($out);


0

Author Comment

ID: 21764569
Getting unsuccessful - 401
0

LVL 39

Expert Comment

ID: 21766108
Status 401 means not authorized.  Did you enter the correct username and password?

Try this, it will display the content no matter what the status.

#!/usr/bin/perl
use WWW::Mechanize;

my $mech = WWW::Mechanize->new(); #NOTE: put your username and password here$mech->credentials( $username,$password );

$mech->get('https://www.sitename.com/directory/filename.csv'); print "status=" .$mech->status . "\n";
print "content=" . $mech->content . "\n";  0 Author Comment ID: 21770621 Tried that and the following is what has been received: status=401 content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>401 Authorization Required</title> </head><body> <h1>Authorization Required</h1> <p>This server could not verify that you are authorized to access the document requested. Either you supplied the wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.</p> <hr> <address>Apache/2.2.6 (Debian) mod_ssl/2.2.6 OpenSSL/0.9.8g mod_apreq2-20051231/ 2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address> </body></html> I have tested the user name and password by logging in through the browser. 0 LVL 39 Expert Comment ID: 21771629 I'm guessing the method of authentication used by the server is different than what the credentials method provides. Could use use firefox with LiveHTTPHeaders when you go to the website to see what headers are provided? 0 Author Comment ID: 21771899 https://www.world-check.com/portal/Downloads/world-check.csv GET /portal/Downloads/world-check.csv HTTP/1.1 Host: www.world-check.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 401 Authorization Required Server: nginx/0.5.35 Date: Thu, 12 Jun 2008 17:45:29 GMT Content-Type: text/html; charset=iso-8859-1 Connection: close WWW-Authenticate: Basic realm="WorldCheck auth" Content-Length: 556 0 LVL 39 Expert Comment ID: 21804109 Are you able to get the file with firefox? Those headers look like you got an error message. Can you post the headers for when it is successful? 0 Author Comment ID: 21804182 https://www.world-check.com/portal/Downloads/world-check.csv GET /portal/Downloads/world-check.csv HTTP/1.1 Host: www.world-check.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: test_cookie=1 HTTP/1.x 401 Authorization Required Server: nginx/0.5.35 Date: Tue, 17 Jun 2008 15:30:06 GMT Content-Type: text/html; charset=iso-8859-1 Connection: close WWW-Authenticate: Basic realm="WorldCheck auth" Content-Length: 556 ---------------------------------------------------------- https://www.world-check.com/portal/Downloads/world-check.csv GET /portal/Downloads/world-check.csv HTTP/1.1 Host: www.world-check.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: test_cookie=1 Authorization: Basic xxxxxxxxxxxxxxxx HTTP/1.x 200 OK Server: nginx/0.5.35 Date: Tue, 17 Jun 2008 15:30:19 GMT Content-Type: text/csv Content-Length: 507135096 Last-Modified: Tue, 17 Jun 2008 15:29:42 GMT Connection: close Accept-Ranges: bytes 0 LVL 39 Accepted Solution Adam314 earned 1400 total points ID: 21804400 #!/usr/bin/perl use WWW::Mechanize; my$mech = WWW::Mechanize->new();

$mech->add_header("Authorization" => "Basic xxxxxxxxxxxxxxxx");$mech->get('https://www.sitename.com/directory/filename.csv');

print "status=" . $mech->status . "\n"; print "content=" .$mech->content . "\n";

0

Author Comment

ID: 21805228
status=401
content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<title>401 Authorization Required</title>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
browser doesn't understand how to supply
the credentials required.</p>
<hr>
<address>Apache/2.2.6 (Debian) mod_ssl/2.2.6 OpenSSL/0.9.8g mod_apreq2-20051231/
2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address>
</body></html>
0

LVL 39

Expert Comment

ID: 21805666
Did you replace the xxxxx in the script with the password?  Was the "Basic xxxxxxx" in the headers actually what was posted, or was there a password there?
0

Author Comment

ID: 21805796
It was string of letters which was probably an encrypted version of the username and password
0

LVL 39

Expert Comment

ID: 21806073
Did you try using that same string in the perl script?
0

Author Comment

ID: 21806517
Yes, which is when I got the 401 message
0

LVL 39

Expert Comment

ID: 21806602
I'm not sure then..... maybe you could use wget or curl
http://www.gnu.org/software/wget/
http://curl.haxx.se/

I'm not sure if either will work with authentication though.
0

Author Comment

ID: 21816079
is there any code for wget
0

LVL 39

Expert Comment

ID: 21816193
The manual for wget is here:
http://www.gnu.org/software/wget/manual/

There are many examples, with one having a username/password towards the bottom of this page:

To call it from perl, you would use:
my $returncode=system("wget ...."); #then check$returncode for success/failure
Or:
my $output=wget .....; #then check$output and $? for success/failure 0 Author Comment ID: 21816326 I used the following and I think it is working: #!/usr/bin/perl use WWW::Mechanize; my$mech = WWW::Mechanize->new();

$mech->get('https://user:password@www.sitename.com/directory/filename.csv'); print "status=" .$mech->status . "\n";
print "content=" . $mech->content . "\n"; i added the user:password string to the address. How can I direct this now to a specific directory on my machine also with a log file 0 Author Comment ID: 21816405 need to pipe this to a file because I ran it from a command prompt and the data appeared in the dos window 0 LVL 39 Assisted Solution Adam314 earned 1400 total points ID: 21816461 You could have the script write it to a file. There are several methods: 1) #save the csv file to$filename
$mech->save_content($filename);

2)
open(my $out, ">$filename") or die "Could not create output: $!\n"; print$out $mech->content; close($out);

Or you could do the redirect from the command prompt:
perl yourscript.pl > some/file.csv
yourscript.pl > some/file.csv
0

Author Comment

ID: 21817514
The final code is:

#!/usr/bin/perl
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();$mech->get('https://xxxx:password@www.sitename/file.csv');

#save the csv file to $filename$mech->save_content($test.csv); print "status=" .$mech->status . "\n";
print "content=" . \$mech->content . "\n";
0

Author Comment

ID: 21832294

The get command actually opens the file in the dos window, is there a command which just downloads the file without opening it.  I tried using the script with a 400MB file and it doesn't work.  Would like to modify the script to just do a straight download
0

LVL 39

Expert Comment

ID: 21832400
It's not the get command that opens the dos window, it is perl itself.  If you don't want the window, you can use wperl instead of perl.  Several ways to do this:

In the command for the shortcut you are clicking, change it to:
c:\perl\bin\wperl.exe "c:\path\to\your\script.pl"
You will not need double-quotes (like shown) if the path does not contain spaces.

Or you could associate the extension .wpl with wperl.exe, and rename the script from something.pl to something.wpl.
0

Author Comment

ID: 21833150
Ok, Adam I have changed the script to use the wperl command, now is there any way to implement a status window or log file to see whats going on with the download, because I am trying to download a 400MB File.  It takes an hour if i do it manually, but when I ran the script yesterday the dos windows was open for about 7 hours and when it finally closed there wasn't any file.
0

LVL 39

Expert Comment

ID: 21833344
The saved file will be in the current directory of the script.  If you want it in a particular directory, add this to the script:
chdir('/path/to/where/you/want/to/save');

You might be able to use the LWP::Simple module to save the file
use LWP::Simple;
Then check the size of /path/to/file.csv as the script is running.  I'm not sure if this function will write data as it is downloaded, or write it all when it's finished.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

With User Account Control (UAC) enabled in Windows 7, one needs to open an elevated Command Prompt in order to run scripts under administrative privileges. Although the elevated Command Prompt accomplishes the task, the question How to run as script…
Without even knowing it, most of us are using web applications on a daily basis.  In fact, Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We generally confuse these web applications to…
This video teaches viewers how to create their own website using cPanel and Wordpress. Tutorial walks users through how to set up their own domain name from tools like Domain Registrar, Hosting Account, and Wordpress. More specifically, the order in…
Six Sigma Control Plans
###### Suggested Courses
Course of the Month4 days, 21 hours left to enroll

#### 670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.