Solved

Help with scraping a web page with Python and BeautifulSoup

Posted on 2014-09-16
2
331 Views
Last Modified: 2014-09-18
Hello everybody!

I need to gather data from this web page:

http://www.fantagazzetta.com/probabili-formazioni-serie-a

it's populated by Italian soccer team's names, player's names and votes for each player.

I've understood how to scrape a bit:
import os
import requests
from bs4 import BeautifulSoup

def clear():
    os.system(['clear','cls'][os.name == 'nt'])

root_url = 'http://www.fantagazzetta.com'
index_url = root_url + '/probabili-formazioni-serie-a'

pagina_html = requests.get(index_url)
dati = BeautifulSoup(pagina_html.text)

clear()

print "Team di sinistra:"
print
for squadra in dati.find_all("div", { "class" : "team-in-p" }):
    print
    print squadra.text
    print
    for giocatore in dati.find_all("div", { "class" : "in" }):
        for dato in giocatore.find_all("div", { "class" : "name"}):
            for nome in dato.find("a"):
                print nome

Open in new window



But I can't think about how to "bind" data (associate team name with all its players and each player with its vote)...

I'd need to nest for loops I think but I don't know how...

Any suggestion?
0
Comment
Question by:ltpitt
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40327393
Here you go.

import urllib2
import re
from BeautifulSoup import BeautifulSoup

class Page(object):
    def __init__(self, url="http://www.fantagazzetta.com/probabili-formazioni-serie-a"):
        self.html = urllib2.urlopen(url)
        self.soup = BeautifulSoup(self.html)
        self.games = list()
        for game in self.soup.findAll("div", "playerall"):
            self.games.append(Game(game))

class Game(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.teams = list()
        self.teams.append(Team(fragment.find("div", "in")))
        self.teams.append(Team(fragment.find("div", "out")))
        
class Team(object):
    def __init__(self, fragment):
        self.fragment = fragment
        self.name = re.sub(r".*/(.*?)/.*", r"\1", fragment.find("a")["href"])
        self.players = list()
        for p in self.fragment.findAll("div", "name"):
            self.players.append(Player(p))

class Player(object):
    def __init__(self, fragment):
        self.name = fragment.a["title"]
        self.percent = fragment.nextSibling.find("div","percent").string

def main():
    page = Page()
    for game in page.games:
        for team in game.teams:
            print team.name
            for player in team.players:
                print "\t{0}: {1}".format(player.name, player.percent)

if __name__ == '__main__':
    main()

Open in new window

0
 
LVL 1

Author Closing Comment

by:ltpitt
ID: 40330105
Masterpiece.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
When crafting your “Why Us” page, there are a plethora of pitfalls to avoid. Follow these five tips, and you’ll be well on your way to creating an effective page.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question