Advertisement

07.16.2008 at 07:27AM PDT, ID: 23569782
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

9.5

avoid downloading the same page twice

Asked by catalini in Python Scripting Language, Perl Programming Language

Tags:

I'm using this script to download data from a website. I would like to avoid downloading the same page twice, how do I need to fix the code?
(i.e. everytime link1 has been already link1 or link2 have already been downloaded, it should skip them).

see http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_23527220.html

Start Free Trial
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
import re
from urllib import urlopen
 
def links(html):
    for a in re.findall(r"href=([^ >]+)", html, re.I):
        if a.startswith('"'): a = a[1:-1]
        yield a
 
count = 1        
for start in xrange(0, 9000, 10):
    m = urlopen("http://www.domain.com/search/?offset=%s" % start).read()
    open("search_%04d.htm" % count, "w").write(m)
    count += 1
    
    for link1 in links(m):
        if re.match(r".*&sort=progress", link1):
            link1 = link1.replace('&sort=alphab', '')
            link1 = 'http://www.domain.com' + link1
            m = urlopen(link1).read()
            open("index_%04d.htm" % count, "w").write(m)
            count += 1
            for link2 in links(m):
                if re.match(r"http://www.domain.com/pages/.*", link2):
                    m = urlopen(link2).read()
                    open("pages_%04d.htm" % count, "w").write(m)
                    count += 1
[+][-]07.16.2008 at 08:29AM PDT, ID: 22016901

View this solution now by starting your 7-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zones: Python Scripting Language, Perl Programming Language
Tags: python, perl
Sign Up Now!
Solution Provided By: ramrom
Participating Experts: 1
Solution Grade: A
 
 
[+][-]07.16.2008 at 08:51AM PDT, ID: 22017146

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.16.2008 at 09:04AM PDT, ID: 22017301

Assisted solutions are selected by the member who asked the question as a comment that contributed to their question's solution.

Start your 7-day free trial to view this Assisted Solution or ask the Experts your question.

 
 
Loading Advertisement...
20080716-EE-VQP-32 / EE_QW_2_20070628