Link to home
Start Free TrialLog in
Avatar of BerkeleyJeff
BerkeleyJeff

asked on

LWP::Parallel::UserAgent chunk size

I want to use LWP::Parallel::UserAgent to issue HTTP requests, and have the responses processed by a callback function. I want to process the whole response data at the same time, so I've set the chunk size to be very large (100 MB). I was thinking that this would make each chunk to be the whole page. However, that didn't work. Each chunk becomes about 1460 bytes anyway.

What could I be doing wrong? Is this a common problem?

Thanks,
Jeff

-------------------------------

use strict;
use LWP::Parallel::UserAgent;


my $ua = LWP::Parallel::UserAgent->new();
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req(5); # sets maximum number of parallel requests per host

while(<>) #read in urls
{
        chomp;
        my $request = HTTP::Request->new(GET => $_);
        print $request;
        $ua->register($request, \&uaCallback, 100000000);
}

$ua->wait ();

sub uaCallback
{
        my($data, $response, $protocol) = @_;
        print "BASE: ", $response->base(), "\n";
        print "LENGTH: ", length($data), "\n";
}

----------------------------------

Output:

echo "http://arctic.fws.gov/permglos.htm" | perl oneDeepCrawl.pl
HTTP::Request=HASH(0x8279c64)BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1159
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 2896
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 4380
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1200
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 432



-------------------------------

Thanks again.
ASKER CERTIFIED SOLUTION
Avatar of jmcg
jmcg
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of BerkeleyJeff
BerkeleyJeff

ASKER

JMCG,

Thanks for your response.

Perhaps I'm misunderstanding the purpose of ParallelUA. My goal was to download a large number different of pages simutaneously. Is this not what ParallelUA is for? Is ParallelUA for downloading a single page using multiple connections?
Yep, parallel will help on getting through a longer list of pages. You only get one thread working per URL.

If you can get what you need from the responses without downloading them all the way to the end, it also can speed things up (using the C_ENDCON response from your callback).