BerkeleyJeff
asked on
LWP::Parallel::UserAgent chunk size
I want to use LWP::Parallel::UserAgent to issue HTTP requests, and have the responses processed by a callback function. I want to process the whole response data at the same time, so I've set the chunk size to be very large (100 MB). I was thinking that this would make each chunk to be the whole page. However, that didn't work. Each chunk becomes about 1460 bytes anyway.
What could I be doing wrong? Is this a common problem?
Thanks,
Jeff
-------------------------- -----
use strict;
use LWP::Parallel::UserAgent;
my $ua = LWP::Parallel::UserAgent-> new();
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req(5); # sets maximum number of parallel requests per host
while(<>) #read in urls
{
chomp;
my $request = HTTP::Request->new(GET => $_);
print $request;
$ua->register($request, \&uaCallback, 100000000);
}
$ua->wait ();
sub uaCallback
{
my($data, $response, $protocol) = @_;
print "BASE: ", $response->base(), "\n";
print "LENGTH: ", length($data), "\n";
}
-------------------------- --------
Output:
echo "http://arctic.fws.gov/permglos.htm" | perl oneDeepCrawl.pl
HTTP::Request=HASH(0x8279c64)BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1159
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 2896
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 4380
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1200
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 432
-------------------------- -----
Thanks again.
What could I be doing wrong? Is this a common problem?
Thanks,
Jeff
--------------------------
use strict;
use LWP::Parallel::UserAgent;
my $ua = LWP::Parallel::UserAgent->
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req(5); # sets maximum number of parallel requests per host
while(<>) #read in urls
{
chomp;
my $request = HTTP::Request->new(GET => $_);
print $request;
$ua->register($request, \&uaCallback, 100000000);
}
$ua->wait ();
sub uaCallback
{
my($data, $response, $protocol) = @_;
print "BASE: ", $response->base(), "\n";
print "LENGTH: ", length($data), "\n";
}
--------------------------
Output:
echo "http://arctic.fws.gov/permglos.htm" | perl oneDeepCrawl.pl
HTTP::Request=HASH(0x8279c64)BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1159
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 2896
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 4380
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1200
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 432
--------------------------
Thanks again.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Yep, parallel will help on getting through a longer list of pages. You only get one thread working per URL.
If you can get what you need from the responses without downloading them all the way to the end, it also can speed things up (using the C_ENDCON response from your callback).
If you can get what you need from the responses without downloading them all the way to the end, it also can speed things up (using the C_ENDCON response from your callback).
ASKER
Thanks for your response.
Perhaps I'm misunderstanding the purpose of ParallelUA. My goal was to download a large number different of pages simutaneously. Is this not what ParallelUA is for? Is ParallelUA for downloading a single page using multiple connections?