Advertisement

06.30.2008 at 08:30AM PDT, ID: 23527220
[x]
Attachment Details

download pages according to a criteria

Asked by catalini in Perl Programming Language, Python Scripting Language, Scripting Languages

Tags: perl, python

This question is a follow up on this http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_23476422.html

I need to create a script that downloads a given page
1) http://www.domainname.net/search/

extracts from the page all the links that follow this pattern (where **** can be anything), and downloads them
2) http://www.domainname.net/***********/?sort=alphab

from this second round of pages extracts all the links with this second pattern
3) http://www.domainname.net/pages/**********/

and downloads them too.

The code below, provided by adam341 works perfectly (downloading http://www.domainname.net/search/?start=$start files and doing the incremental passages).

The problem is that when it saves the first set of files it changes all relative links to "file://...." so that the second

m|http://www.domainname.net/.*/\?sort=alphab|;

and

m|http://www.domainname.net/pages/.*/|;

do not find anything to use.

Is there any way to save the pages without changing the content and the relative links? (I've tried to download the same page with firefox and it has all the links correctly saved with http://www.domainname.net/).


Start Free Trial
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
use WWW::Mechanize;
 
my $mech = WWW::Mechanize->new();
 
my $count=1;
for my $start10 (0..900) {
    my $start = $start10 * 10;
    $mech->get("http://www.domainname.net/search/?start=$start");
    $mech->save_content(sprintf("file%04d.txt", $count++));
    
    foreach my $link1 ($mech->links) {
        next unless $link1->url =~ m|http://www.domainname.net/.*/\?sort=alphab|;
        $mech->get($link1->url);
        $mech->save_content(sprintf("file%04d.txt", $count++));
        foreach my $link2 ($mech->links) {
            next unless $link2 =~ m|http://www.domainname.net/pages/.*/|;
            $mech->get($link2->url);
            $mech->save_content(sprintf("file%04d.txt", $count++));
        }
    }
}
[+][-]07.03.2008 at 09:59PM PDT, ID: 21931068

View this solution now by starting your 7-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zones: Perl Programming Language, Python Scripting Language, Scripting Languages
Tags: perl, python
Sign Up Now!
Solution Provided By: mish33
Participating Experts: 1
Solution Grade: A
 
 
[+][-]07.03.2008 at 10:22PM PDT, ID: 21931127

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.04.2008 at 08:05AM PDT, ID: 21933468

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.04.2008 at 10:52AM PDT, ID: 21934190

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.05.2008 at 01:40AM PDT, ID: 21936348

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.05.2008 at 06:23AM PDT, ID: 21937009

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.05.2008 at 07:21AM PDT, ID: 21937163

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.05.2008 at 09:09AM PDT, ID: 21937467

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.05.2008 at 12:03PM PDT, ID: 21938005

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.05.2008 at 12:53PM PDT, ID: 21938144

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.05.2008 at 12:55PM PDT, ID: 21938148

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.06.2008 at 12:50AM PDT, ID: 21939637

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.06.2008 at 01:05AM PDT, ID: 21939656

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.06.2008 at 10:52AM PDT, ID: 21941072

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.07.2008 at 02:26AM PDT, ID: 21943582

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.07.2008 at 08:13AM PDT, ID: 21945749

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.07.2008 at 08:19AM PDT, ID: 21945817

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.07.2008 at 08:57AM PDT, ID: 21946158

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.07.2008 at 09:01AM PDT, ID: 21946198

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.07.2008 at 08:12PM PDT, ID: 21950540

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.07.2008 at 08:13PM PDT, ID: 21950548

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.16.2008 at 02:21AM PDT, ID: 22014365

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.16.2008 at 09:05AM PDT, ID: 22017314

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
 
Loading Advertisement...
20080716-EE-VQP-32 / EE_QW_2_20070628