I have written a PERL program that rips lines from a web page (I am ripping the Eastern Division NBA standings so that I can display this on my site). The program has worked very well, until recently, when the web page altered it's format slightly. Now, instead of a single line containing all the information for one team, there are seven lines per team, one line of junk, seven lines, one line of junk, etc. My program as it exists is listed below:
#!/usr/bin/perl
#
# by: Michael D. McClellan
#-------------------------
----------
----------
----------
----------
---------
# Define the output file, and the input webpage
#-------------------------
----------
----------
----------
----------
---------
$standings = "/home/celticna/public_htm
l/calendar
/standings
.txt";
# $foxsports = "
http://msn.foxsports.com/nba/team?categoryId=71076";
use English;
use CGI;
use integer;
use LWP::UserAgent;
use Net::SMTP;
print "Content-type: text/html\n\n";
open(STANDINGS,"$standings
");
@standings = <STANDINGS>;
close(STANDINGS);
$ua = new LWP::UserAgent;
$front = "
http://";
$domain = "msn.foxsports.com/nba/tea
m?category
Id=71076";
$data = $front . $domain;
$lookup = new
HTTP::Request 'GET', "$data";
$response = $ua->request($lookup);
@lines = split (/\n/, $response->content);
$countme = 0;
open(REAL,">$standings");
foreach $line (@lines) {
if ( $line =~ /<td align="left"><a href="\/nba\/team\?statsId
=/ ) {
$countme = $countme + 1;
print $countme;
if ($countme == 1) {
print REAL "<!--STANDINGS1-->$line\n"
;}
elsif ($countme == 2) {
print REAL "<!--STANDINGS2-->$line\n"
;}
elsif ($countme == 3) {
print REAL "<!--STANDINGS3-->$line\n"
;}
elsif ($countme == 4) {
print REAL "<!--STANDINGS4-->$line\n"
;}
elsif ($countme == 5) {
print REAL "<!--STANDINGS5-->$line\n"
;}
else {
print REAL "$line";}
}
}
close(REAL);
print "Standings!";
exit;
When it runs, it now pulls the following lines from the page:
<td align="left"><a href="/nba/team?statsId=20
">76ers</a
></td>
<td align="left"><a href="/nba/team?statsId=17
">Nets</a>
</td>
<td align="left"><a href="/nba/team?statsId=2"
>Celtics</
a></td>
<td align="left"><a href="/nba/team?statsId=18
">Knicks</
a></td>
<td align="left"><a href="/nba/team?statsId=28
">Raptors<
/a></td>
What I really need to pull from the page is this:
<td align="left"><a href="/nba/team?statsId=20
">76ers</a
></td>
<td>6</td>
<td>5</td>
<td>.545</td>
<td>0</td>
<td>4-3</td>
<td>2-2</td></tr>
<tr class="bgC" align="center">
<td align="left"><a href="/nba/team?statsId=17
">Nets</a>
</td>
<td>5</td>
<td>4</td>
<td>.556</td>
<td>0</td>
<td>4-2</td>
<td>1-2</td></tr>
<tr class="bgC" align="center">
<td align="left"><a href="/nba/team?statsId=2"
>Celtics</
a></td>
<td>4</td>
<td>5</td>
<td>.444</td>
<td>1</td>
<td>4-3</td>
<td>0-2</td></tr>
<tr class="bgC" align="center">
<td align="left"><a href="/nba/team?statsId=18
">Knicks</
a></td>
<td>3</td>
<td>7</td>
<td>.300</td>
<td>3</td>
<td>1-2</td>
<td>2-5</td></tr>
<tr class="bgC" align="center">
<td align="left"><a href="/nba/team?statsId=28
">Raptors<
/a></td>
<td>1</td>
<td>9</td>
<td>.100</td>
<td>5</td>
<td>1-6</td>
<td>0-3</td></tr>
<tr class="bgFtr">
I would like to have everything on one line, in the following format (omitting the junk row that contains bgC or bgFtr)...when complete, the format would look like this:
<td align="left"><a href="/nba/team?statsId=20
">76ers</a
></td><td>
6</td><td>
5</td><td>
.545</td><
td>0</td><
td>4-3</td
><td>2-2</
td></tr>
<td align="left"><a href="/nba/team?statsId=17
">Nets</a>
</td><td>5
</td><td>4
</td><td>.
556</td><t
d>0</td><t
d>4-2</td>
<td>1-2</t
d></tr>
<td align="left"><a href="/nba/team?statsId=2"
>Celtics</
a></td><td
>4</td><td
>5</td><td
>.444</td>
<td>1</td>
<td>4-3</t
d><td>0-2<
/td></tr>
<td align="left"><a href="/nba/team?statsId=18
">Knicks</
a></td><td
>3</td><td
>7</td><td
>.300</td>
<td>3</td>
<td>1-2</t
d><td>2-5<
/td></tr>
<td align="left"><a href="/nba/team?statsId=28
">Raptors<
/a></td><t
d>1</td><t
d>9</td><t
d>.100</td
><td>5</td
><td>1-6</
td><td>0-3
</td></tr>
I am still learning PERL, and I am still learning how to manipulate files in this manner. I am going to start playing with it now, but because of time constraints would appreciate any direction that you may be able to provide...thanks in advance for your assistance in this matter....
Start Free Trial