Solved

Help with scraping a web page with Python and BeautifulSoup

Posted on 2014-09-16
2
316 Views
Last Modified: 2014-09-18
Hello everybody!

I need to gather data from this web page:

http://www.fantagazzetta.com/probabili-formazioni-serie-a

it's populated by Italian soccer team's names, player's names and votes for each player.

I've understood how to scrape a bit:
import os
import requests
from bs4 import BeautifulSoup

def clear():
    os.system(['clear','cls'][os.name == 'nt'])

root_url = 'http://www.fantagazzetta.com'
index_url = root_url + '/probabili-formazioni-serie-a'

pagina_html = requests.get(index_url)
dati = BeautifulSoup(pagina_html.text)

clear()

print "Team di sinistra:"
print
for squadra in dati.find_all("div", { "class" : "team-in-p" }):
    print
    print squadra.text
    print
    for giocatore in dati.find_all("div", { "class" : "in" }):
        for dato in giocatore.find_all("div", { "class" : "name"}):
            for nome in dato.find("a"):
                print nome

Open in new window



But I can't think about how to "bind" data (associate team name with all its players and each player with its vote)...

I'd need to nest for loops I think but I don't know how...

Any suggestion?
0
Comment
Question by:ltpitt
2 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40327393
Here you go.

import urllib2
import re
from BeautifulSoup import BeautifulSoup

class Page(object):
    def __init__(self, url="http://www.fantagazzetta.com/probabili-formazioni-serie-a"):
        self.html = urllib2.urlopen(url)
        self.soup = BeautifulSoup(self.html)
        self.games = list()
        for game in self.soup.findAll("div", "playerall"):
            self.games.append(Game(game))

class Game(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.teams = list()
        self.teams.append(Team(fragment.find("div", "in")))
        self.teams.append(Team(fragment.find("div", "out")))
        
class Team(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.name = re.sub(r".*/(.*?)/.*", r"\1", fragment.find("a")["href"])
        self.players = list()
        for p in self.fragment.findAll("div", "name"):
            self.players.append(Player(p))

class Player(object):
    def __init__(self, fragment):
        self.name = fragment.a["title"]
        self.percent = fragment.nextSibling.find("div","percent").string

def main():
    page = Page()
    for game in page.games:
        for team in game.teams:
            print team.name
            for player in team.players:
                print "\t{0}: {1}".format(player.name, player.percent)

if __name__ == '__main__':
    main()

Open in new window

0
 
LVL 1

Author Closing Comment

by:ltpitt
ID: 40330105
Masterpiece.
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Get to know the ins and outs of building a web-based ERP system for your enterprise. Development timeline, technology, and costs outlined.
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

778 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question