Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 551
  • Last Modified:

powershell HTML parse

My foreach loop is not filtering out the innertext array with the string used to filter. Seems like there are 2 sets for each game because of 2 box scores. I want the results from the first set.  

so it looks like this works for the first line works but then code falls apart.

Result should like this for every line:
jets 4, Blue Jackets 2


Add-Type -path C:\PStemp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

	$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
	$wc = New-Object System.Net.WebClient;
	$doc = New-Object HtmlAgilityPack.HtmlDocument
	$doc.LoadHtml($wc.DownloadString($Website))
	
	$game = $doc.DocumentNode.SelectNodes('.//table["mod-container mod-no-header-footer mod-scorebox final mod-scorebox-final"]') | select -first 4
	$scores = @()
	$i = 0

	ForEach ($innerHTML in $game.InnerHTML | Where-Object { $_.InnerHTML -notlike "*-totalScoreHome*" }) #-or $game.InnerHTML -notlike "*-totalScoreAway*"
	{
		
		$Teams = $innerHTML -split "`"><a href=`""
		
		$Team1 = $Teams[1].Substring($Teams[1].IndexOf("http://espn.go.com") + 48, $Teams[1].IndexOf("</a>") - $Teams[1].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
		$Team2 = $Teams[2].Substring($Teams[2].IndexOf("http://espn.go.com") + 48, $Teams[2].IndexOf("</a>") - $Teams[2].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
	
		$Score1 = $Teams[1].Substring($Teams[1].IndexOf("-awayHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
		$Score2 = $Teams[2].Substring($Teams[2].IndexOf("-homeHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
	
		$TeamScore = $Team1 + ' ' + $Score1 + ', ' + $Team2 + ' ' + $Score2
		
		$scores += New-Object PsObject -Property @{ Scores = $TeamScore; }
		$i = $i + 2
	}
	$scores | select Scores | Format-Table -AutoSize

Open in new window

0
Leo Torres
Asked:
Leo Torres
  • 3
  • 2
1 Solution
 
Joe KlimisCommented:
Hi

using PowerShell 3  or above , I would do something like the following instead of using html agility pack

$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$Request = Invoke-WebRequest -URI $webSite
$h = $request.ParsedHtml.getElementsByTagName("div")
$h | where classname -eq 'team-name' | select InnerText
$a = $h | where classname -eq 'span-2' | select innerhtml
$teama = ($a.innerHTML -split "</A>")[0].split(">")[11]
$scorea =  ($a.innerHTML -split "</A>")[1].split("<")[4].split(">")[1]
$teamb = (($a.innerHTML -split "</A>")[1] -split ">")[17]
$scoreb = ($a.innerHTML -split "</A>")[2].split(">")[4].split("<")

write-output $teama , $scorea , $teamb , $scoreb

Open in new window




If you detail you requirements , I can help you using this method.

Regards
Joe
0
 
Leo TorresSQL DeveloperAuthor Commented:
The requirement is just to extract team name and score. For the day in question.

Out put by your code is this
Jets
4
Blue Jackets
2
/SPAN

Open in new window


This is only one game I need all results for that day and dont bring back "/SPAN".


Just so I know why would you not use the Agility pack? Is there a draw back? I used it because I thought it was easier but what ever works is fine with me. I like taking different approaches servers as a teaching point for myself.
0
 
Joe KlimisCommented:
Hi Leo

I have never used the agility pack , perhaps I should take a look  :-), but not all sites I work on allow download of additional tools, so I usually try and make things work using out the box features.

This i think will do what you want
$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$Request = Invoke-WebRequest -URI $webSite   #  fetch web page
$h = $request.ParsedHtml.getElementsByTagName("table")  #  split page by tag  to isolate the required information
$results = ($h | where classname -eq "game-header-table" | select innerhtml) #  create an array of game results

foreach ( $result in  $Results )   # loop through each result , extracting the required information.
{
	$a = $result.innerhtml
	$teama = ($a -split "</A>")[0].split(">")[5]
	$scorea  = ($a  -split "</A>")[1].split("<")[4].split(">")[1]
	$teamb = (($a -split "</A>")[1] -split ">")[17]
	$scoreb = ($a -split "</A>")[2].split(">")[4].split("<").split("/")[0]
	write-output "$teama  $scorea    VS  $teamb  $scoreb "
}

Open in new window

0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
Leo TorresSQL DeveloperAuthor Commented:
Wow, indeed it works thank you!
0
 
Leo TorresSQL DeveloperAuthor Commented:
thanks
0
 
QlemoC++ DeveloperCommented:
Coming late, but here it is. Had to use dummy vars to ignore some content as I was not able to filter that stuff appropriately via XPath:
Add-Type -path C:\temp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$wc = New-Object System.Net.WebClient;
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml($wc.DownloadString($Website))

$games = $doc.DocumentNode.SelectNodes('//*[@class="team-name"]|//*[@class="team-score"]') | select -Expand InnerText

while ($games)
{
  $Team1, $Score1, $dummy, $Team2, $Score2, $dummy, $dummy, $dummy, $games = $games
  Write-Host $Team1 $Score1', '$Team2 $Score2
}

Open in new window

0

Featured Post

Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now