Leo Torres
asked on
powershell HTML parse
My foreach loop is not filtering out the innertext array with the string used to filter. Seems like there are 2 sets for each game because of 2 box scores. I want the results from the first set.
so it looks like this works for the first line works but then code falls apart.
Result should like this for every line:
jets 4, Blue Jackets 2
so it looks like this works for the first line works but then code falls apart.
Result should like this for every line:
jets 4, Blue Jackets 2
Add-Type -path C:\PStemp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS
$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$wc = New-Object System.Net.WebClient;
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml($wc.DownloadString($Website))
$game = $doc.DocumentNode.SelectNodes('.//table["mod-container mod-no-header-footer mod-scorebox final mod-scorebox-final"]') | select -first 4
$scores = @()
$i = 0
ForEach ($innerHTML in $game.InnerHTML | Where-Object { $_.InnerHTML -notlike "*-totalScoreHome*" }) #-or $game.InnerHTML -notlike "*-totalScoreAway*"
{
$Teams = $innerHTML -split "`"><a href=`""
$Team1 = $Teams[1].Substring($Teams[1].IndexOf("http://espn.go.com") + 48, $Teams[1].IndexOf("</a>") - $Teams[1].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
$Team2 = $Teams[2].Substring($Teams[2].IndexOf("http://espn.go.com") + 48, $Teams[2].IndexOf("</a>") - $Teams[2].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
$Score1 = $Teams[1].Substring($Teams[1].IndexOf("-awayHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
$Score2 = $Teams[2].Substring($Teams[2].IndexOf("-homeHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
$TeamScore = $Team1 + ' ' + $Score1 + ', ' + $Team2 + ' ' + $Score2
$scores += New-Object PsObject -Property @{ Scores = $TeamScore; }
$i = $i + 2
}
$scores | select Scores | Format-Table -AutoSize
ASKER
The requirement is just to extract team name and score. For the day in question.
Out put by your code is this
This is only one game I need all results for that day and dont bring back "/SPAN".
Just so I know why would you not use the Agility pack? Is there a draw back? I used it because I thought it was easier but what ever works is fine with me. I like taking different approaches servers as a teaching point for myself.
Out put by your code is this
Jets
4
Blue Jackets
2
/SPAN
This is only one game I need all results for that day and dont bring back "/SPAN".
Just so I know why would you not use the Agility pack? Is there a draw back? I used it because I thought it was easier but what ever works is fine with me. I like taking different approaches servers as a teaching point for myself.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Wow, indeed it works thank you!
ASKER
thanks
Coming late, but here it is. Had to use dummy vars to ignore some content as I was not able to filter that stuff appropriately via XPath:
Add-Type -path C:\temp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS
$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$wc = New-Object System.Net.WebClient;
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml($wc.DownloadString($Website))
$games = $doc.DocumentNode.SelectNodes('//*[@class="team-name"]|//*[@class="team-score"]') | select -Expand InnerText
while ($games)
{
$Team1, $Score1, $dummy, $Team2, $Score2, $dummy, $dummy, $dummy, $games = $games
Write-Host $Team1 $Score1', '$Team2 $Score2
}
using PowerShell 3 or above , I would do something like the following instead of using html agility pack
Open in new window
If you detail you requirements , I can help you using this method.
Regards
Joe