Solved

Help with scraping a web page with Python and BeautifulSoup

Posted on 2014-09-16
2
305 Views
Last Modified: 2014-09-18
Hello everybody!

I need to gather data from this web page:

http://www.fantagazzetta.com/probabili-formazioni-serie-a

it's populated by Italian soccer team's names, player's names and votes for each player.

I've understood how to scrape a bit:
import os
import requests
from bs4 import BeautifulSoup

def clear():
    os.system(['clear','cls'][os.name == 'nt'])

root_url = 'http://www.fantagazzetta.com'
index_url = root_url + '/probabili-formazioni-serie-a'

pagina_html = requests.get(index_url)
dati = BeautifulSoup(pagina_html.text)

clear()

print "Team di sinistra:"
print
for squadra in dati.find_all("div", { "class" : "team-in-p" }):
    print
    print squadra.text
    print
    for giocatore in dati.find_all("div", { "class" : "in" }):
        for dato in giocatore.find_all("div", { "class" : "name"}):
            for nome in dato.find("a"):
                print nome

Open in new window



But I can't think about how to "bind" data (associate team name with all its players and each player with its vote)...

I'd need to nest for loops I think but I don't know how...

Any suggestion?
0
Comment
Question by:ltpitt
2 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40327393
Here you go.

import urllib2
import re
from BeautifulSoup import BeautifulSoup

class Page(object):
    def __init__(self, url="http://www.fantagazzetta.com/probabili-formazioni-serie-a"):
        self.html = urllib2.urlopen(url)
        self.soup = BeautifulSoup(self.html)
        self.games = list()
        for game in self.soup.findAll("div", "playerall"):
            self.games.append(Game(game))

class Game(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.teams = list()
        self.teams.append(Team(fragment.find("div", "in")))
        self.teams.append(Team(fragment.find("div", "out")))
        
class Team(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.name = re.sub(r".*/(.*?)/.*", r"\1", fragment.find("a")["href"])
        self.players = list()
        for p in self.fragment.findAll("div", "name"):
            self.players.append(Player(p))

class Player(object):
    def __init__(self, fragment):
        self.name = fragment.a["title"]
        self.percent = fragment.nextSibling.find("div","percent").string

def main():
    page = Page()
    for game in page.games:
        for team in game.teams:
            print team.name
            for player in team.players:
                print "\t{0}: {1}".format(player.name, player.percent)

if __name__ == '__main__':
    main()

Open in new window

0
 
LVL 1

Author Closing Comment

by:ltpitt
ID: 40330105
Masterpiece.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Envision that you are chipping away at another e-business site with a team of pundit developers and designers. Everything seems, by all accounts, to be going easily.
This tutorial demonstrates how to identify and create boundary or building outlines in Google Maps. In this example, I outline the boundaries of an enclosed skatepark within a community park.  Login to your Google Account, then  Google for "Google M…
The viewer will learn how to dynamically set the form action using jQuery.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now