Solved

Posted on 2008-06-11
918 Views
I am trying to automate my process of downloading a weekly .csv file from a site.  The url is "https://www.sitename.com/directory/filename.csv".  By following the link there is a windows prompt for a username and password.  Any ideas?
0
Question by:speede1
• 14
• 13

LVL 39

Expert Comment

ID: 21763175
If it is basic authentication, you could use the credentials method from LWP::UserAgent:
http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP/UserAgent.pm
0

Author Comment

ID: 21764185
0

LVL 39

Expert Comment

ID: 21764313
Try this code.  You will have to install WWW::Mechanize, if it isn't already installed.
On windows: at a prompt, type: ppm install WWW-Mechanize
On anything else: at a prompt as root: perl -MCPAN -e 'install WWW::Mechanize'

#!/usr/bin/perl

use WWW::Mechanize;

my $mech = WWW::Mechanize->new(); #NOTE: put your username and password here$mech->credentials( $username,$password );

$mech->get('https://www.sitename.com/directory/filename.csv'); die "Unsuccessful: status=" .$mech->status . "\n" unless $mech->success; open(my$out, ">filename.txt") or die "Output file: $!\n"; print$out $mech->content; close($out);

0

Author Comment

ID: 21764569
Getting unsuccessful - 401
0

LVL 39

Expert Comment

ID: 21766108
Status 401 means not authorized.  Did you enter the correct username and password?

Try this, it will display the content no matter what the status.

#!/usr/bin/perl

use WWW::Mechanize;

my $mech = WWW::Mechanize->new(); #NOTE: put your username and password here$mech->credentials( $username,$password );

$mech->get('https://www.sitename.com/directory/filename.csv'); print "status=" .$mech->status . "\n";

print "content=" . $mech->content . "\n"; 0 Author Comment ID: 21770621 Tried that and the following is what has been received: status=401 content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>401 Authorization Required</title> </head><body> <h1>Authorization Required</h1> <p>This server could not verify that you are authorized to access the document requested. Either you supplied the wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.</p> <hr> <address>Apache/2.2.6 (Debian) mod_ssl/2.2.6 OpenSSL/0.9.8g mod_apreq2-20051231/ 2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address> </body></html> I have tested the user name and password by logging in through the browser. 0 LVL 39 Expert Comment ID: 21771629 I'm guessing the method of authentication used by the server is different than what the credentials method provides. Could use use firefox with LiveHTTPHeaders when you go to the website to see what headers are provided? 0 Author Comment ID: 21771899 https://www.world-check.com/portal/Downloads/world-check.csv GET /portal/Downloads/world-check.csv HTTP/1.1 Host: www.world-check.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 401 Authorization Required Server: nginx/0.5.35 Date: Thu, 12 Jun 2008 17:45:29 GMT Content-Type: text/html; charset=iso-8859-1 Connection: close WWW-Authenticate: Basic realm="WorldCheck auth" Content-Length: 556 0 LVL 39 Expert Comment ID: 21804109 Are you able to get the file with firefox? Those headers look like you got an error message. Can you post the headers for when it is successful? 0 Author Comment ID: 21804182 https://www.world-check.com/portal/Downloads/world-check.csv GET /portal/Downloads/world-check.csv HTTP/1.1 Host: www.world-check.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: test_cookie=1 HTTP/1.x 401 Authorization Required Server: nginx/0.5.35 Date: Tue, 17 Jun 2008 15:30:06 GMT Content-Type: text/html; charset=iso-8859-1 Connection: close WWW-Authenticate: Basic realm="WorldCheck auth" Content-Length: 556 ---------------------------------------------------------- https://www.world-check.com/portal/Downloads/world-check.csv GET /portal/Downloads/world-check.csv HTTP/1.1 Host: www.world-check.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: test_cookie=1 Authorization: Basic xxxxxxxxxxxxxxxx HTTP/1.x 200 OK Server: nginx/0.5.35 Date: Tue, 17 Jun 2008 15:30:19 GMT Content-Type: text/csv Content-Length: 507135096 Last-Modified: Tue, 17 Jun 2008 15:29:42 GMT Connection: close Accept-Ranges: bytes 0 LVL 39 Accepted Solution Adam314 earned 350 total points ID: 21804400 #!/usr/bin/perl use WWW::Mechanize; my$mech = WWW::Mechanize->new();

$mech->add_header("Authorization" => "Basic xxxxxxxxxxxxxxxx");$mech->get('https://www.sitename.com/directory/filename.csv');

print "status=" . $mech->status . "\n"; print "content=" .$mech->content . "\n";
0

Author Comment

ID: 21805228
status=401
content=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<title>401 Authorization Required</title>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
browser doesn't understand how to supply
the credentials required.</p>
<hr>
2.6.0 mod_perl/2.0.3 Perl/v5.8.8 Server at perlhttp Port 80</address>
</body></html>
0

LVL 39

Expert Comment

ID: 21805666
Did you replace the xxxxx in the script with the password?  Was the "Basic xxxxxxx" in the headers actually what was posted, or was there a password there?
0

Author Comment

ID: 21805796
It was string of letters which was probably an encrypted version of the username and password
0

LVL 39

Expert Comment

ID: 21806073
Did you try using that same string in the perl script?
0

Author Comment

ID: 21806517
Yes, which is when I got the 401 message
0

LVL 39

Expert Comment

ID: 21806602
I'm not sure then..... maybe you could use wget or curl
http://www.gnu.org/software/wget/
http://curl.haxx.se/

I'm not sure if either will work with authentication though.
0

Author Comment

ID: 21816079
is there any code for wget
0

LVL 39

Expert Comment

ID: 21816193
The manual for wget is here:
http://www.gnu.org/software/wget/manual/

To call it from perl, you would use:
my $returncode=system("wget ...."); #then check$returncode for success/failure
Or:
my $output=wget .....; #then check$output and $? for success/failure 0 Author Comment ID: 21816326 I used the following and I think it is working: #!/usr/bin/perl use WWW::Mechanize; my$mech = WWW::Mechanize->new();

$mech->get('https://user:password@www.sitename.com/directory/filename.csv'); print "status=" .$mech->status . "\n";
print "content=" . $mech->content . "\n"; i added the user:password string to the address. How can I direct this now to a specific directory on my machine also with a log file 0 Author Comment ID: 21816405 need to pipe this to a file because I ran it from a command prompt and the data appeared in the dos window 0 LVL 39 Assisted Solution Adam314 earned 350 total points ID: 21816461 You could have the script write it to a file. There are several methods: 1) #save the csv file to$filename
$mech->save_content($filename);

2)
open(my $out, ">$filename") or die "Could not create output: $!\n"; print$out $mech->content; close($out);

Or you could do the redirect from the command prompt:
perl yourscript.pl > some/file.csv
yourscript.pl > some/file.csv
0

Author Comment

ID: 21817514
The final code is:

#!/usr/bin/perl
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();$mech->get('https://xxxx:password@www.sitename/file.csv');

#save the csv file to $filename$mech->save_content($test.csv); print "status=" .$mech->status . "\n";
print "content=" . \$mech->content . "\n";
0

Author Comment

ID: 21832294

The get command actually opens the file in the dos window, is there a command which just downloads the file without opening it.  I tried using the script with a 400MB file and it doesn't work.  Would like to modify the script to just do a straight download
0

LVL 39

Expert Comment

ID: 21832400
It's not the get command that opens the dos window, it is perl itself.  If you don't want the window, you can use wperl instead of perl.  Several ways to do this:

In the command for the shortcut you are clicking, change it to:
c:\perl\bin\wperl.exe "c:\path\to\your\script.pl"
You will not need double-quotes (like shown) if the path does not contain spaces.

Or you could associate the extension .wpl with wperl.exe, and rename the script from something.pl to something.wpl.
0

Author Comment

ID: 21833150
Ok, Adam I have changed the script to use the wperl command, now is there any way to implement a status window or log file to see whats going on with the download, because I am trying to download a 400MB File.  It takes an hour if i do it manually, but when I ran the script yesterday the dos windows was open for about 7 hours and when it finally closed there wasn't any file.
0

LVL 39

Expert Comment

ID: 21833344
The saved file will be in the current directory of the script.  If you want it in a particular directory, add this to the script:
chdir('/path/to/where/you/want/to/save');

You might be able to use the LWP::Simple module to save the file
use LWP::Simple;
Then check the size of /path/to/file.csv as the script is running.  I'm not sure if this function will write data as it is downloaded, or write it all when it's finished.
0

## Featured Post

Foreword (May 2015) This web page has appeared at Google.  It's definitely worth considering! https://www.google.com/about/careers/students/guide-to-technical-development.html How to Know You are Making a Difference at EE In August, 2013, one …
New Relic: Our company recently started researching several products to figure out what were the best ways for us to increase our web page speed and to quickly identify performance problems that we may be having. One of the products we evaluated wa…
This video teaches users how to migrate an existing Wordpress website to a new domain.
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to selectively show certain fields based on user input using rules to gather relevant information and data from your forms. The rules feature provides you with an opportunity…