Solved

Help with scraping a web page with Python and BeautifulSoup

Posted on 2014-09-16
2
323 Views
Last Modified: 2014-09-18
Hello everybody!

I need to gather data from this web page:

http://www.fantagazzetta.com/probabili-formazioni-serie-a

it's populated by Italian soccer team's names, player's names and votes for each player.

I've understood how to scrape a bit:
import os
import requests
from bs4 import BeautifulSoup

def clear():
    os.system(['clear','cls'][os.name == 'nt'])

root_url = 'http://www.fantagazzetta.com'
index_url = root_url + '/probabili-formazioni-serie-a'

pagina_html = requests.get(index_url)
dati = BeautifulSoup(pagina_html.text)

clear()

print "Team di sinistra:"
print
for squadra in dati.find_all("div", { "class" : "team-in-p" }):
    print
    print squadra.text
    print
    for giocatore in dati.find_all("div", { "class" : "in" }):
        for dato in giocatore.find_all("div", { "class" : "name"}):
            for nome in dato.find("a"):
                print nome

Open in new window



But I can't think about how to "bind" data (associate team name with all its players and each player with its vote)...

I'd need to nest for loops I think but I don't know how...

Any suggestion?
0
Comment
Question by:ltpitt
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40327393
Here you go.

import urllib2
import re
from BeautifulSoup import BeautifulSoup

class Page(object):
    def __init__(self, url="http://www.fantagazzetta.com/probabili-formazioni-serie-a"):
        self.html = urllib2.urlopen(url)
        self.soup = BeautifulSoup(self.html)
        self.games = list()
        for game in self.soup.findAll("div", "playerall"):
            self.games.append(Game(game))

class Game(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.teams = list()
        self.teams.append(Team(fragment.find("div", "in")))
        self.teams.append(Team(fragment.find("div", "out")))
        
class Team(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.name = re.sub(r".*/(.*?)/.*", r"\1", fragment.find("a")["href"])
        self.players = list()
        for p in self.fragment.findAll("div", "name"):
            self.players.append(Player(p))

class Player(object):
    def __init__(self, fragment):
        self.name = fragment.a["title"]
        self.percent = fragment.nextSibling.find("div","percent").string

def main():
    page = Page()
    for game in page.games:
        for team in game.teams:
            print team.name
            for player in team.players:
                print "\t{0}: {1}".format(player.name, player.percent)

if __name__ == '__main__':
    main()

Open in new window

0
 
LVL 1

Author Closing Comment

by:ltpitt
ID: 40330105
Masterpiece.
0

Featured Post

Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Because your company can’t afford for you to make SEO mistakes, you’ll want to ensure you’re taking the right steps each and every time you post a new piece of content. This list of optimization do’s and don’ts can help you become an SEO wizard.
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
This video teaches users how to migrate an existing Wordpress website to a new domain.

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question