Link to home
Start Free TrialLog in
Avatar of Leo Torres
Leo TorresFlag for United States of America

asked on

powershell parse

Finally I was able to complete a Powershell parse. But my code is very elementary. Code below Works, my goal is to learn more Powershell. Can anyone write this differently where the index array starting point may changes a little bit and the code still wont break. My concern is that this may be to specific for this page and my be prone to break easily if string changes by 1 space. (I do understand that if there are major code changes to website this wont work anyway.)

Add-Type -path C:\PStemp\HtmlAgilityPack\Net40\htmlagilitypack.dll

	$Website = ""
	$wc = New-Object System.Net.WebClient;
	$doc = New-Object HtmlAgilityPack.HtmlDocument

	$game = $doc.DocumentNode.SelectNodes('.//table["league-title"]')

	ForEach ($innerHTML in $game.InnerHTML){
	$Teams = $innerHTML -split "`"><a href=`""
	$Team1 = $Teams[1].Substring($Teams[1].IndexOf("") + 38, $Teams[1].IndexOf("</a>") - $Teams[1].IndexOf("") - 42).Replace("/", "").Replace("`"", "")
	$Team2 = $Teams[2].Substring($Teams[2].IndexOf("") + 38, $Teams[2].IndexOf("</a>") - $Teams[2].IndexOf("") - 42).Replace("/", "").Replace("`"", "")
	$Team1 = $Team1.toCharArray()
	$Team1 = -join $Team1
	$Team2 = $Team2.toCharArray()
	$Team2 = -join $Team2
	$Team1 = $Team1.Substring(0,$Team1.IndexOf("-")).ToUpper()
	$Team2 = $Team2.Substring(0, $Team2.IndexOf("-")).ToUpper()
	$Team1 = $Team1.toCharArray()
	$Team1 = -join $Team1
	$Team2 = $Team2.toCharArray()
	$Team2 = -join $Team2

	$Score1 = $Teams[1].Substring($Teams[1].IndexOf("-alsT`">") + 7, 2).Replace("<","").Replace("/", "-1")
	$Score2 = $Teams[2].Substring($Teams[2].IndexOf("-hlsT`">") + 7, 2).Replace("<", "").Replace("/", "-1")
	Write-Host $Team1 $Score1', '$Team2 $Score2

Open in new window

Avatar of David Johnson, CD
David Johnson, CD
Flag of Canada image

which version of the htmlagilitypack are you using..
New-Object : Cannot find type [HtmlAgilityPack.HtmlDocument]: verify that the
assembly containing this type is loaded.
At line:5 char:9
+     $doc = New-Object HtmlAgilityPack.HtmlDocument
+            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidType: (:) [New-Object], PSArgumentExcepti
    + FullyQualifiedErrorId : TypeNotFound,Microsoft.PowerShell.Commands.NewOb
Avatar of Leo Torres


Neglected? Why Moderator?
Hi leo

I use  split to divide up the text  as the basic html does not usually change that often

Here is an example

$Website = ""
$Request = Invoke-WebRequest -URI $webSite
$h = $request.ParsedHtml.getElementsByTagName("div")
$h | where classname -eq 'team-name' | select InnerText
$a = $h | where classname -eq 'span-2' | select innerhtml
$teama = ($a.innerHTML -split "</A>")[0].split(">")[11]
$scorea =  ($a.innerHTML -split "</A>")[1].split("<")[4].split(">")[1]
$teamb = (($a.innerHTML -split "</A>")[1] -split ">")[17]
$scoreb = ($a.innerHTML -split "</A>")[2].split(">")[4].split("<")

write-output $teama , $scorea , $teamb , $scoreb

Open in new window

Let me know if you require a specific example or further help.

Oh, that rings a bell. August 2013, College Football results, then NFL results ... A pity each league has a different page layout on ESPN, and it is never as straightforward as it should.
As it seems, one of the reliable ways is to get each game's HTML ID, extract the unique ID number (e.g. 340415104), and use that in an XPath like //*[@id="340415104-aTeamName"]/a (for "away" team name).
Qlemo: as a programmer I realized this page matches teams to an ID at the beginning then references those IDs to represent team. So your suggestion crossed my mind but I had no idea how to do it. Remember I need to get results for day passed.

Thanks again for your help on this.
Hi Leo

is this similar to the other ticket you raised ?    would you like some further help ?

Avatar of Qlemo
Flag of Germany image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry Qlemo.. been in vacation mode for past few weeks.