Expiring Today—Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Downloading 1,000,000 webpages with perl

Posted on 2001-08-06
7
Medium Priority
?
183 Views
Last Modified: 2010-03-05
That is the task at hand.  We have a VERY large list of pages to download, and several servers at our disposal to help download them all.  We have access to four dual-processor linux boxes.

Currently, we're using the LWP::UserAgent to fetch the pages, and Sys::AlarmCall (a wrapper module around SIGALARM) to monitor each fetch in case it timesout improperly.

However, this doesn't always seem to work, and sometimes, the SIGALARM fails and the page request continues for a very long time.  Sometimes, it even seems to cause the process to be halted.  

Does anyone have experience with such a project?  What tools/strategies did you employ?  How did you handle requests that timed out?
0
Comment
Question by:brgordon
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6357095
is this a duplicate question? please delete it.
0
 
LVL 16

Expert Comment

by:maneshr
ID: 6628577
brgordon,

Did you get a solution you were looking for?

What solution, if any, did you use?

Your response in finalizing this question is appreciated.

Thanks,
0
 

Author Comment

by:brgordon
ID: 6630529
maneshr,

I did receive a solution in Perl, however, my final solution was to switch to Java (better thread handling).
Perl's SIGALARM is not reliable enough, and only one is allowed per system.

any other questions, let me know.

cheers,
Brett
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:brgordon
ID: 6631626
maneshr,

Sorry, somehow this question got posted twice. ahoffman has already answered it.

Brett
0
 

Author Comment

by:brgordon
ID: 6631629
This question was already posted, but somehow, got posted twice.  I have already accepted an answer from ahoffman concering the question.  Please delete this copy.

THanks,
Brett
0
 
LVL 16

Expert Comment

by:maneshr
ID: 6631985
brgordon,

"..Please delete this copy...."

You can delete the question yourself. If you do not know how to delete it, then please post your request, with the URL of this question to "Community Support" (http://www.experts-exchange.com/jsp/qList.jsp?ta=commspt)
0
 
LVL 1

Accepted Solution

by:
Moondancer earned 0 total points
ID: 6958022
I refunded 300 points to you for this question and closed it today.  Sorry for the delay.
Moondancer - EE Moderator
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question