[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

LWP::Parallel::UserAgent chunk size

Posted on 2006-07-05
3
Medium Priority
?
792 Views
Last Modified: 2012-06-21
I want to use LWP::Parallel::UserAgent to issue HTTP requests, and have the responses processed by a callback function. I want to process the whole response data at the same time, so I've set the chunk size to be very large (100 MB). I was thinking that this would make each chunk to be the whole page. However, that didn't work. Each chunk becomes about 1460 bytes anyway.

What could I be doing wrong? Is this a common problem?

Thanks,
Jeff

-------------------------------

use strict;
use LWP::Parallel::UserAgent;


my $ua = LWP::Parallel::UserAgent->new();
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req(5); # sets maximum number of parallel requests per host

while(<>) #read in urls
{
        chomp;
        my $request = HTTP::Request->new(GET => $_);
        print $request;
        $ua->register($request, \&uaCallback, 100000000);
}

$ua->wait ();

sub uaCallback
{
        my($data, $response, $protocol) = @_;
        print "BASE: ", $response->base(), "\n";
        print "LENGTH: ", length($data), "\n";
}

----------------------------------

Output:

echo "http://arctic.fws.gov/permglos.htm" | perl oneDeepCrawl.pl
HTTP::Request=HASH(0x8279c64)BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1159
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1448
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1484
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 2896
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 4380
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 1200
BASE: http://arctic.fws.gov/permglos.htm
LENGTH: 432



-------------------------------

Thanks again.
0
Comment
Question by:BerkeleyJeff
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 20

Accepted Solution

by:
jmcg earned 2000 total points
ID: 17055688
Why parallel if you're only interested in handling completed responses?

Alter your callback to accumulate the incoming data in the response object:

   if( length $data) { $response->add_content( $data); }

You will have to figure out the best way to determine when you've seen the last of the data so you can process it as a single chunk. Perhaps check whether

    $response->length > length( $response->content->as_string)



===============

See this article by Randy Schwartz -- while it's old, it should not need much updating to re-use his methods:

http://www.stonehenge.com/merlyn/WebTechniques/col27.html

0
 

Author Comment

by:BerkeleyJeff
ID: 17063227
JMCG,

Thanks for your response.

Perhaps I'm misunderstanding the purpose of ParallelUA. My goal was to download a large number different of pages simutaneously. Is this not what ParallelUA is for? Is ParallelUA for downloading a single page using multiple connections?
0
 
LVL 20

Expert Comment

by:jmcg
ID: 17063362
Yep, parallel will help on getting through a longer list of pages. You only get one thread working per URL.

If you can get what you need from the responses without downloading them all the way to the end, it also can speed things up (using the C_ENDCON response from your callback).
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question