powershell HTML parse

My foreach loop is not filtering out the innertext array with the string used to filter. Seems like there are 2 sets for each game because of 2 box scores. I want the results from the first set.  

so it looks like this works for the first line works but then code falls apart.

Result should like this for every line:
jets 4, Blue Jackets 2


Add-Type -path C:\PStemp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

	$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
	$wc = New-Object System.Net.WebClient;
	$doc = New-Object HtmlAgilityPack.HtmlDocument
	$doc.LoadHtml($wc.DownloadString($Website))
	
	$game = $doc.DocumentNode.SelectNodes('.//table["mod-container mod-no-header-footer mod-scorebox final mod-scorebox-final"]') | select -first 4
	$scores = @()
	$i = 0

	ForEach ($innerHTML in $game.InnerHTML | Where-Object { $_.InnerHTML -notlike "*-totalScoreHome*" }) #-or $game.InnerHTML -notlike "*-totalScoreAway*"
	{
		
		$Teams = $innerHTML -split "`"><a href=`""
		
		$Team1 = $Teams[1].Substring($Teams[1].IndexOf("http://espn.go.com") + 48, $Teams[1].IndexOf("</a>") - $Teams[1].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
		$Team2 = $Teams[2].Substring($Teams[2].IndexOf("http://espn.go.com") + 48, $Teams[2].IndexOf("</a>") - $Teams[2].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
	
		$Score1 = $Teams[1].Substring($Teams[1].IndexOf("-awayHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
		$Score2 = $Teams[2].Substring($Teams[2].IndexOf("-homeHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
	
		$TeamScore = $Team1 + ' ' + $Score1 + ', ' + $Team2 + ' ' + $Score2
		
		$scores += New-Object PsObject -Property @{ Scores = $TeamScore; }
		$i = $i + 2
	}
	$scores | select Scores | Format-Table -AutoSize

Open in new window

LVL 8
Leo TorresSQL DeveloperAsked:
Who is Participating?
 
Joe KlimisCommented:
Hi Leo

I have never used the agility pack , perhaps I should take a look  :-), but not all sites I work on allow download of additional tools, so I usually try and make things work using out the box features.

This i think will do what you want
$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$Request = Invoke-WebRequest -URI $webSite   #  fetch web page
$h = $request.ParsedHtml.getElementsByTagName("table")  #  split page by tag  to isolate the required information
$results = ($h | where classname -eq "game-header-table" | select innerhtml) #  create an array of game results

foreach ( $result in  $Results )   # loop through each result , extracting the required information.
{
	$a = $result.innerhtml
	$teama = ($a -split "</A>")[0].split(">")[5]
	$scorea  = ($a  -split "</A>")[1].split("<")[4].split(">")[1]
	$teamb = (($a -split "</A>")[1] -split ">")[17]
	$scoreb = ($a -split "</A>")[2].split(">")[4].split("<").split("/")[0]
	write-output "$teama  $scorea    VS  $teamb  $scoreb "
}

Open in new window

0
 
Joe KlimisCommented:
Hi

using PowerShell 3  or above , I would do something like the following instead of using html agility pack

$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$Request = Invoke-WebRequest -URI $webSite
$h = $request.ParsedHtml.getElementsByTagName("div")
$h | where classname -eq 'team-name' | select InnerText
$a = $h | where classname -eq 'span-2' | select innerhtml
$teama = ($a.innerHTML -split "</A>")[0].split(">")[11]
$scorea =  ($a.innerHTML -split "</A>")[1].split("<")[4].split(">")[1]
$teamb = (($a.innerHTML -split "</A>")[1] -split ">")[17]
$scoreb = ($a.innerHTML -split "</A>")[2].split(">")[4].split("<")

write-output $teama , $scorea , $teamb , $scoreb

Open in new window




If you detail you requirements , I can help you using this method.

Regards
Joe
0
 
Leo TorresSQL DeveloperAuthor Commented:
The requirement is just to extract team name and score. For the day in question.

Out put by your code is this
Jets
4
Blue Jackets
2
/SPAN

Open in new window


This is only one game I need all results for that day and dont bring back "/SPAN".


Just so I know why would you not use the Agility pack? Is there a draw back? I used it because I thought it was easier but what ever works is fine with me. I like taking different approaches servers as a teaching point for myself.
0
Creating Active Directory Users from a Text File

If your organization has a need to mass-create AD user accounts, watch this video to see how its done without the need for scripting or other unnecessary complexities.

 
Leo TorresSQL DeveloperAuthor Commented:
Wow, indeed it works thank you!
0
 
Leo TorresSQL DeveloperAuthor Commented:
thanks
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Coming late, but here it is. Had to use dummy vars to ignore some content as I was not able to filter that stuff appropriately via XPath:
Add-Type -path C:\temp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$wc = New-Object System.Net.WebClient;
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml($wc.DownloadString($Website))

$games = $doc.DocumentNode.SelectNodes('//*[@class="team-name"]|//*[@class="team-score"]') | select -Expand InnerText

while ($games)
{
  $Team1, $Score1, $dummy, $Team2, $Score2, $dummy, $dummy, $dummy, $games = $games
  Write-Host $Team1 $Score1', '$Team2 $Score2
}

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.