Solved

Help with scraping a web page with Python and BeautifulSoup

Posted on 2014-09-16
2
320 Views
Last Modified: 2014-09-18
Hello everybody!

I need to gather data from this web page:

http://www.fantagazzetta.com/probabili-formazioni-serie-a

it's populated by Italian soccer team's names, player's names and votes for each player.

I've understood how to scrape a bit:
import os
import requests
from bs4 import BeautifulSoup

def clear():
    os.system(['clear','cls'][os.name == 'nt'])

root_url = 'http://www.fantagazzetta.com'
index_url = root_url + '/probabili-formazioni-serie-a'

pagina_html = requests.get(index_url)
dati = BeautifulSoup(pagina_html.text)

clear()

print "Team di sinistra:"
print
for squadra in dati.find_all("div", { "class" : "team-in-p" }):
    print
    print squadra.text
    print
    for giocatore in dati.find_all("div", { "class" : "in" }):
        for dato in giocatore.find_all("div", { "class" : "name"}):
            for nome in dato.find("a"):
                print nome

Open in new window



But I can't think about how to "bind" data (associate team name with all its players and each player with its vote)...

I'd need to nest for loops I think but I don't know how...

Any suggestion?
0
Comment
Question by:ltpitt
2 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40327393
Here you go.

import urllib2
import re
from BeautifulSoup import BeautifulSoup

class Page(object):
    def __init__(self, url="http://www.fantagazzetta.com/probabili-formazioni-serie-a"):
        self.html = urllib2.urlopen(url)
        self.soup = BeautifulSoup(self.html)
        self.games = list()
        for game in self.soup.findAll("div", "playerall"):
            self.games.append(Game(game))

class Game(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.teams = list()
        self.teams.append(Team(fragment.find("div", "in")))
        self.teams.append(Team(fragment.find("div", "out")))
        
class Team(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.name = re.sub(r".*/(.*?)/.*", r"\1", fragment.find("a")["href"])
        self.players = list()
        for p in self.fragment.findAll("div", "name"):
            self.players.append(Player(p))

class Player(object):
    def __init__(self, fragment):
        self.name = fragment.a["title"]
        self.percent = fragment.nextSibling.find("div","percent").string

def main():
    page = Page()
    for game in page.games:
        for team in game.teams:
            print team.name
            for player in team.players:
                print "\t{0}: {1}".format(player.name, player.percent)

if __name__ == '__main__':
    main()

Open in new window

0
 
LVL 1

Author Closing Comment

by:ltpitt
ID: 40330105
Masterpiece.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
PHP Installer 5 37
Scroll 5 news at a time. 4 33
Pass stdin text to Python subprocess 3 21
How to control cache of some js files ? 7 38
I've been asked to discuss some of the UX activities that I'm using with my team. Here I will share some details about how we approach UX projects.
Get to know the ins and outs of building a web-based ERP system for your enterprise. Development timeline, technology, and costs outlined.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question