We help IT Professionals succeed at work.

powershell HTML parse

1,399 Views
Last Modified: 2014-12-04
My foreach loop is not filtering out the innertext array with the string used to filter. Seems like there are 2 sets for each game because of 2 box scores. I want the results from the first set.  

so it looks like this works for the first line works but then code falls apart.

Result should like this for every line:
jets 4, Blue Jackets 2


Add-Type -path C:\PStemp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

	$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
	$wc = New-Object System.Net.WebClient;
	$doc = New-Object HtmlAgilityPack.HtmlDocument
	$doc.LoadHtml($wc.DownloadString($Website))
	
	$game = $doc.DocumentNode.SelectNodes('.//table["mod-container mod-no-header-footer mod-scorebox final mod-scorebox-final"]') | select -first 4
	$scores = @()
	$i = 0

	ForEach ($innerHTML in $game.InnerHTML | Where-Object { $_.InnerHTML -notlike "*-totalScoreHome*" }) #-or $game.InnerHTML -notlike "*-totalScoreAway*"
	{
		
		$Teams = $innerHTML -split "`"><a href=`""
		
		$Team1 = $Teams[1].Substring($Teams[1].IndexOf("http://espn.go.com") + 48, $Teams[1].IndexOf("</a>") - $Teams[1].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
		$Team2 = $Teams[2].Substring($Teams[2].IndexOf("http://espn.go.com") + 48, $Teams[2].IndexOf("</a>") - $Teams[2].IndexOf("http://espn.go.com") - 53).Replace("/", "").Replace("`"", "")
	
		$Score1 = $Teams[1].Substring($Teams[1].IndexOf("-awayHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
		$Score2 = $Teams[2].Substring($Teams[2].IndexOf("-homeHeaderScore`">") + 18, 2).Replace("<", "").Replace("/", "-1")
	
		$TeamScore = $Team1 + ' ' + $Score1 + ', ' + $Team2 + ' ' + $Score2
		
		$scores += New-Object PsObject -Property @{ Scores = $TeamScore; }
		$i = $i + 2
	}
	$scores | select Scores | Format-Table -AutoSize

Open in new window

Comment
Watch Question

Hi

using PowerShell 3  or above , I would do something like the following instead of using html agility pack

$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$Request = Invoke-WebRequest -URI $webSite
$h = $request.ParsedHtml.getElementsByTagName("div")
$h | where classname -eq 'team-name' | select InnerText
$a = $h | where classname -eq 'span-2' | select innerhtml
$teama = ($a.innerHTML -split "</A>")[0].split(">")[11]
$scorea =  ($a.innerHTML -split "</A>")[1].split("<")[4].split(">")[1]
$teamb = (($a.innerHTML -split "</A>")[1] -split ">")[17]
$scoreb = ($a.innerHTML -split "</A>")[2].split(">")[4].split("<")

write-output $teama , $scorea , $teamb , $scoreb

Open in new window




If you detail you requirements , I can help you using this method.

Regards
Joe
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
The requirement is just to extract team name and score. For the day in question.

Out put by your code is this
Jets
4
Blue Jackets
2
/SPAN

Open in new window


This is only one game I need all results for that day and dont bring back "/SPAN".


Just so I know why would you not use the Agility pack? Is there a draw back? I used it because I thought it was easier but what ever works is fine with me. I like taking different approaches servers as a teaching point for myself.
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Wow, indeed it works thank you!
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
thanks
Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
Coming late, but here it is. Had to use dummy vars to ignore some content as I was not able to filter that stuff appropriately via XPath:
Add-Type -path C:\temp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$wc = New-Object System.Net.WebClient;
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml($wc.DownloadString($Website))

$games = $doc.DocumentNode.SelectNodes('//*[@class="team-name"]|//*[@class="team-score"]') | select -Expand InnerText

while ($games)
{
  $Team1, $Score1, $dummy, $Team2, $Score2, $dummy, $dummy, $dummy, $games = $games
  Write-Host $Team1 $Score1', '$Team2 $Score2
}

Open in new window

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.