Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1563
  • Last Modified:

Rss Sports Scores to Download

Experts,
I need to make an application that looks at each MLB (Major League Baseball) Team and tells you how many runs each team scores each week.  Currently, I just add up the scores manually  between Monday and Sunday and enter those into a spreadsheet.

I am interested in either:  Taking Scores from an RSS feed, parsing them and Adding them to my mysql database.  OR finding a database that already exists avaialbe for daily updates where I can pull this information.

I have no idea how to parse through an RSS feed to add to a database.  Has anyone done anything like this before?

Making this one 500 pts if anyone can help me out.

Thanks,

-Dan
0
dsg138
Asked:
dsg138
  • 11
  • 11
1 Solution
 
Ray PaseurCommented:
Parsing an RSS feed is pretty easy - there are literally hundreds of RSS feed readers and aggregators.  For examples and background on RSS, this reference is pretty much accurate:
http://cyber.law.harvard.edu/rss/rss.html

But for gathering MLB statistics, an RSS feed might not be the best place to look -- There may be specialized XML for baseball stats, but it won't be the RSS subset.  Google can find you seventeen million alternatives with this search:
http://www.google.com/search?hl=en&q=major+league+baseball+statistics

This one from ESPN looks somewhat promising:
http://sports.espn.go.com/mlb/stats/batting?league=mlb

And this has some promise, too:
http://erikberg.com/xmlstats/

Are you planning to write the MySQL code yourself?  If so, you may want to refer to some other online data bases for guidance:
http://www.google.com/search?hl=en&q=mlb+statistics+database

HTH, ~Ray
0
 
dsg138Author Commented:
Thanks Ray,
I looked at XML pages as well.  The problem is that most of these sites I see are for baseball statistics.  There are so many types of statistics and most are related to the players.

The information I'm looking for is much simpler.  I just want the final score of each game and to transfer that to a database each night.

Even if I could find the data in txt files, or in some other format, that would make this easy.  Each night, I want to run a script that sends the scores from Somewhere to a table that has the team and the score of each game.  A small amount of data, but difficult to find in a usable format.
0
 
Ray PaseurCommented:
Yep, I completely sympathize.  There is almost always more information than what we want!  The most likely approach to get the information is to extract it from the "big pile" that is available at one of the other sources.  

In my experience, you can get away with "scraping" the data out of web pages, but it is better to have an XML schema or a formal API.  In any case, you're somewhat dependent on the format and content of the foreign data source. If they decide to change their web pages, your interface will break, so you have to be very careful about how you extract the data and what you put into your data base.

If you have found a source that contains the data you want, extracting it should not be too hard.  Is there one you feel will be dependable over the long term?
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 
dsg138Author Commented:
Thanks Ray... I found a site that has daily scores available.  There is a small fee for this (.25), but it might be worth it.

It's in SportsML format.  Here is a sample:
http://showcase.xmlteam.com/index.php/documents/fetch/doc-ids/xt.6262574-daysked/stylesheet

Sample in HTML:
http://showcase.xmlteam.com/index.php/documents/fetch/doc-ids/xt.6262574-daysked/stylesheet/sportsmlt

Once the games end, the game times will become scores.

Now that I have a file that I can use, I want to populate the Teams and Scores into a table in my mysql database.  That's the part that I'm unsure how to do.

Thanks in advance!

-dan
0
 
Ray PaseurCommented:
Hi, Dan.  I looked at it... All the scores are zeros.  Do you have a sample of the XML that shows scores?  I may be able to help parse those out along with the team names.

Getting it into a MySQL data base is a whole different issue.  On a scale of 1-10 where are your MySQL skills?
0
 
dsg138Author Commented:
Thanks Ray.
I talked to the company that generates these files.  They don't have any samples available that have the completed scores.  I'm guessing that we may not see that until opening day!

However, they told me that this:
team-stats score="0"
will be updated with the scores as the games start.
Since I'm only interested in the final scores, I would not download the file until all of the games are finished (probably the middle of the night) so for the sake of this document, we can assume that those zeros will be actual scores.

It looks like as Long as we grab:
team-metadata team-key
team-stats score
And the Date from somewhere, this could be enough useful information for me.

I would say I'm about a 6 in MySQL proficiency.  I have multiple dbs configured, with numerous tables and have php pages that both display my data and allow it to be added, edited, or deleted.
0
 
Ray PaseurCommented:
OK, thanks for that info.  I'll see what we can do with those fields.
0
 
Ray PaseurCommented:
Not exactly parsing an RSS feed, but here is an object iterator that will extract the salient information.  HTH, ~Ray
<?php // RAY_temp_baseball.php
 
error_reporting(E_ALL);
echo "<pre>";
 
// TEST DATA
$xml = 'http://showcase.xmlteam.com/index.php/documents/fetch/doc-ids/xt.6262574-daysked/stylesheet';
$obj = SimpleXML_Load_File($xml);
 
// CREATE AN OBJECT ITERATOR
foreach ($obj->{"sports-content"}->schedule->{"sports-event"} as $game_data)
{
   $team_name = $game_data->team->{"team-metadata"}->name["first"] . ' ' . $game_data->team->{"team-metadata"}->name["last"];
   $team_score = $game_data->team->{"team-stats"}["score"];
   $team_decided = $game_data->team->{"team-stats"}["event-outcome"];
 
// VISUALIZE RESULTS
   echo "$team_name $team_score $team_decided \n";
}

Open in new window

0
 
Ray PaseurCommented:
Sorry - I was in a hurry earlier - this is much more complete and it produces a CSV-type output. ~Ray
<?php // RAY_temp_baseball.php
 
error_reporting(E_ALL);
echo "<pre>";
 
// TEST DATA
$xml = 'http://showcase.xmlteam.com/index.php/documents/fetch/doc-ids/xt.6262574-daysked/stylesheet';
$obj = SimpleXML_Load_File($xml);
 
// CREATE AN OBJECT ITERATOR
foreach ($obj->{"sports-content"}->schedule->{"sports-event"} as $game_data)
{
 
// GET ISO8601-FORMAT DATE ( == MySQL DATETIME )
   $date_time = $game_data->{"event-metadata"}["start-date-time"];
 
// CLEAR ARRAYS FOR CONTEST DATA
   $team_name    = array();
   $team_score   = array();
   $team_decided = array();
 
// CREATE ANOTHER OBJECT ITERATOR
   foreach ($game_data->team as $thing)
   {
      $team_name[]    = $thing->{"team-metadata"}->name["first"] . ' ' . $thing->{"team-metadata"}->name["last"];
      $team_score[]   = $thing->{"team-stats"}["score"];
      $team_decided[] = $thing->{"team-stats"}["event-outcome"];
   }
 
// VISUALIZE RESULTS
   echo "$date_time, $team_name[0], $team_name[1], $team_score[0], $team_score[1], $team_decided[0], $team_decided[1] \n";
}

Open in new window

0
 
dsg138Author Commented:
Ray, this looks great!  I'm going to try it tonight.  I'll modify the XML to add some real scores to see how it will output!  I'll let you know how it goes.  Much appreciated!
0
 
dsg138Author Commented:
I get the following error:
Call to undefined function: simplexml_load_file()  on line 8.
My webhost is running PHP 4.4.9.  I'm guessing I need to be on PHP 5?
I'm going to request to be upgraded to PHP5.  Hopefully they're able to.
0
 
dsg138Author Commented:
Wow Ray, this worked great.
I just had my php upgraded to 5.0 and this parsed it perfectly.  I since added a few more fields from the file such as Team-Key.

I also added some SQL insert Statments inside your FOR loop so that thses are instanly added to my table.  It works successfully.
Each time I refresh the page, it now inserts 29 records into my MySQL GameResults table.

My question is...
Each day of the season, I will have a new and different XML file.
I will have this file automatically downloaded each night to a Directory on my server, let's say it's called /NEWXMLS.

Is there a way I can have the new file automatically referenced instead of chaning the hardcoded reference of $xml in this php file?

Is there a way to choose the most recent one, or the ONLY file in the directory?

If I can be certain that only 1 XML file will be in my directory, could I have this script reference the only file in the folder, without retyping it's name?  Then, I could create a CRON job to automatically run the script (say at 3AM) and then delete the file.  It would then be ready and emptied for the next day's file.  And the data would've been already added to my Database.

Or is there a better way?  What do you think?  Thanks again for your expertise!
0
 
Ray PaseurCommented:
Good choice on PHP5, and not only for this application - all of its OOP stuff is better than PHP4.

As to the general App -Design questions, here is what I would do.  Catch all the XML files and keep them all in some useful place - probably in the file system in their own directory.  Storage is cheap and if you ever need to go back and get the original data again, you'll be glad you saved it.  Design your data base load application so that you can run it and rerun it over and over again, with or without all the pieces of the data.  I've found that maximum flexibility is the watchword when you're dealing with external data.

When you capture the XML files, you can name them with a field in the file name that contains the ISO8601 DATETIME value.  In PHP this will be date('Y-m-d\THis').  If you made three captures in a day, you would still have unique names for each capture.  The data _should_ be the same in all the captures for the day, but if it's not (for example a game went into a lot of extra innings) your app should still work.

You will want to iterate over the collection of XML files to extract the dates, teams, scores, etc., and build the INSERT or REPLACE INTO commands for MySQL.

You can automate a lot of this with a CRON job - just be sure that you keep the original data and that the cron job can fail or get rerun without damaging your data base.

You can find the newest files in any directory with filemtime().  See the user-contributed notes for generalized examples of how to identify the newest files in a directory.  
http://us3.php.net/manual/en/function.filemtime.php

HTH, ~Ray
0
 
dsg138Author Commented:
Many thanks!  You taught me alot here... And I think the application will be really nice!

Let me know if you want to play!  :)

Thanks again,

-Dan
0
 
dsg138Author Commented:
Perfect!  Thanks Ray!
0
 
Ray PaseurCommented:
Thanks for the points!  It's sounding like it should be fun. ~Ray
0
 
dsg138Author Commented:
Yeah, it is fun actually...
Here is my test site:
http://allgolfphilly.com/RFTW/weeklyteams.php

The scores you see came from the first run of the XML parser early this morning.

Thanks again!

-Dan
0
 
Ray PaseurCommented:
Yeah, the Nationals are right where we have learned to find them!
0
 
dsg138Author Commented:
yes, last when sorting by city, ALPHABETICALLY!  :)
0
 
Ray PaseurCommented:
;-)
0
 
dsg138Author Commented:
Ray, can I ask you a quick follow up question?  
The page we discussed above where this works fine is called scores1.php
It works successfully from my web browser.  I recently setup a CRON job for this page to be accessed at 3AM.

/usr/local/php5/bin/php -q /home/allgo5/public_html/RFTW/scores1.php

It works when accessing the page directly, but the cron job has been failing every morning and I don't know why...  I get the follwoing in my email:

failed to open stream: HTTP request failed!  in /home/allgo5/public_html/RFTW/scores.php on line 33
Warning: simplexml_load_file(): I/O warning : failed to load external entity

simplexml_load_file works great when directly accessing this page.
Is there anything I can do to make it work from my Cron job?
Thanks again Ray!
0
 
Ray PaseurCommented:
That's a good question for your hosting company.  They may have placed restrictions on CRON jobs, or they may force them to run in "safe mode" or something like that.  I would guess a helpful thing might be to capture the output of phpinfo() in the cron job and send that to yourself via the mail() command.  But first I would contact tech support at the hosting company and ask them why a script that runs correctly over the WWW fails when it is run as a cron job.  Show them a link to this question, too.

Good luck with it, ~Ray
0

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

  • 11
  • 11
Tackle projects and never again get stuck behind a technical roadblock.
Join Now