Solved

LWP::Parallel::UserAgent chunk size

Posted on 2006-07-05
3
778 Views
Last Modified: 2012-06-21
I want to use LWP::Parallel::UserAgent to issue HTTP requests, and have the responses processed by a callback function. I want to process the whole response data at the same time, so I've set the chunk size to be very large (100 MB). I was thinking that this would make each chunk to be the whole page. However, that didn't work. Each chunk becomes about 1460 bytes anyway.

What could I be doing wrong? Is this a common problem?

Thanks,
Jeff

-------------------------------

use strict;
use LWP::Parallel::UserAgent;


my $ua = LWP::Parallel::UserAgent->new();
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req(5); # sets maximum number of parallel requests per host

while(<>) #read in urls
{
        chomp;
        my $request = HTTP::Request->new(GET => $_);
        print $request;
        $ua->register($request, \&uaCallback, 100000000);
}

$ua->wait ();

sub uaCallback
{
        my($data, $response, $protocol) = @_;
        print "BASE: ", $response->base(), "\n";
        print "LENGTH: ", length($data), "\n";
}

----------------------------------

Output:

echo "http://arctic.fws.gov/permglos.htm" | perl oneDeepCrawl.pl
HTTP::Request=HASH(0x8279c64)BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1159
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 2896
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 4380
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1200
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 432



-------------------------------

Thanks again.
0
Comment
Question by:BerkeleyJeff
  • 2
3 Comments
 
LVL 20

Accepted Solution

by:
jmcg earned 500 total points
Comment Utility
Why parallel if you're only interested in handling completed responses?

Alter your callback to accumulate the incoming data in the response object:

   if( length $data) { $response->add_content( $data); }

You will have to figure out the best way to determine when you've seen the last of the data so you can process it as a single chunk. Perhaps check whether

    $response->length > length( $response->content->as_string)



===============

See this article by Randy Schwartz -- while it's old, it should not need much updating to re-use his methods:

http://www.stonehenge.com/merlyn/WebTechniques/col27.html

0
 

Author Comment

by:BerkeleyJeff
Comment Utility
JMCG,

Thanks for your response.

Perhaps I'm misunderstanding the purpose of ParallelUA. My goal was to download a large number different of pages simutaneously. Is this not what ParallelUA is for? Is ParallelUA for downloading a single page using multiple connections?
0
 
LVL 20

Expert Comment

by:jmcg
Comment Utility
Yep, parallel will help on getting through a longer list of pages. You only get one thread working per URL.

If you can get what you need from the responses without downloading them all the way to the end, it also can speed things up (using the C_ENDCON response from your callback).
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
perl to mysql 5 128
SIMPLE Perl Regex 1 144
cpan issue 1 55
PHP equivalent of opening a com object 5 64
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now