brgordon
asked on
Downloading 1,000,000 webpages with perl
That is the task at hand. We have a VERY large list of pages to download, and several servers at our disposal to help download them all. We have access to four dual-processor linux boxes.
Currently, we're using the LWP::UserAgent to fetch the pages, and Sys::AlarmCall (a wrapper module around SIGALARM) to monitor each fetch in case it timesout improperly.
However, this doesn't always seem to work, and sometimes, the SIGALARM fails and the page request continues for a very long time. Sometimes, it even seems to cause the process to be halted.
Does anyone have experience with such a project? What tools/strategies did you employ? How did you handle requests that timed out?
Currently, we're using the LWP::UserAgent to fetch the pages, and Sys::AlarmCall (a wrapper module around SIGALARM) to monitor each fetch in case it timesout improperly.
However, this doesn't always seem to work, and sometimes, the SIGALARM fails and the page request continues for a very long time. Sometimes, it even seems to cause the process to be halted.
Does anyone have experience with such a project? What tools/strategies did you employ? How did you handle requests that timed out?
is this a duplicate question? please delete it.
brgordon,
Did you get a solution you were looking for?
What solution, if any, did you use?
Your response in finalizing this question is appreciated.
Thanks,
Did you get a solution you were looking for?
What solution, if any, did you use?
Your response in finalizing this question is appreciated.
Thanks,
ASKER
maneshr,
I did receive a solution in Perl, however, my final solution was to switch to Java (better thread handling).
Perl's SIGALARM is not reliable enough, and only one is allowed per system.
any other questions, let me know.
cheers,
Brett
I did receive a solution in Perl, however, my final solution was to switch to Java (better thread handling).
Perl's SIGALARM is not reliable enough, and only one is allowed per system.
any other questions, let me know.
cheers,
Brett
ASKER
maneshr,
Sorry, somehow this question got posted twice. ahoffman has already answered it.
Brett
Sorry, somehow this question got posted twice. ahoffman has already answered it.
Brett
ASKER
This question was already posted, but somehow, got posted twice. I have already accepted an answer from ahoffman concering the question. Please delete this copy.
THanks,
Brett
THanks,
Brett
brgordon,
"..Please delete this copy...."
You can delete the question yourself. If you do not know how to delete it, then please post your request, with the URL of this question to "Community Support" (https://www.experts-exchange.com/jsp/qList.jsp?ta=commspt)
"..Please delete this copy...."
You can delete the question yourself. If you do not know how to delete it, then please post your request, with the URL of this question to "Community Support" (https://www.experts-exchange.com/jsp/qList.jsp?ta=commspt)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.