Solved

powershell parse

Posted on 2014-11-28
11
129 Views
Last Modified: 2015-01-03
Hello,
Finally I was able to complete a Powershell parse. But my code is very elementary. Code below Works, my goal is to learn more Powershell. Can anyone write this differently where the index array starting point may changes a little bit and the code still wont break. My concern is that this may be to specific for this page and my be prone to break easily if string changes by 1 space. (I do understand that if there are major code changes to website this wont work anyway.)


Add-Type -path C:\PStemp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

	$Website = "http://scores.espn.go.com/mlb/scoreboard?date=20140415"
	$wc = New-Object System.Net.WebClient;
	$doc = New-Object HtmlAgilityPack.HtmlDocument
	$doc.LoadHtml($wc.DownloadString($Website))

	$game = $doc.DocumentNode.SelectNodes('.//table["league-title"]')

	ForEach ($innerHTML in $game.InnerHTML){
	
	$Teams = $innerHTML -split "`"><a href=`""
	
	$Team1 = $Teams[1].Substring($Teams[1].IndexOf("http://espn.go.com") + 38, $Teams[1].IndexOf("</a>") - $Teams[1].IndexOf("http://espn.go.com") - 42).Replace("/", "").Replace("`"", "")
	$Team2 = $Teams[2].Substring($Teams[2].IndexOf("http://espn.go.com") + 38, $Teams[2].IndexOf("</a>") - $Teams[2].IndexOf("http://espn.go.com") - 42).Replace("/", "").Replace("`"", "")
	
	$Team1 = $Team1.toCharArray()
	[Array]::Reverse($Team1)
	$Team1 = -join $Team1
	
	$Team2 = $Team2.toCharArray()
	[Array]::Reverse($Team2)
	$Team2 = -join $Team2
	
	
	$Team1 = $Team1.Substring(0,$Team1.IndexOf("-")).ToUpper()
	$Team2 = $Team2.Substring(0, $Team2.IndexOf("-")).ToUpper()
	
	$Team1 = $Team1.toCharArray()
	[Array]::Reverse($Team1)
	$Team1 = -join $Team1
	
	$Team2 = $Team2.toCharArray()
	[Array]::Reverse($Team2)
	$Team2 = -join $Team2

			
	$Score1 = $Teams[1].Substring($Teams[1].IndexOf("-alsT`">") + 7, 2).Replace("<","").Replace("/", "-1")
	$Score2 = $Teams[2].Substring($Teams[2].IndexOf("-hlsT`">") + 7, 2).Replace("<", "").Replace("/", "-1")
	
	Write-Host $Team1 $Score1', '$Team2 $Score2
	
}

Open in new window

0
Comment
Question by:Leo Torres
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 80

Expert Comment

by:David Johnson, CD, MVP
ID: 40471234
which version of the htmlagilitypack are you using..
New-Object : Cannot find type [HtmlAgilityPack.HtmlDocument]: verify that the
assembly containing this type is loaded.
At line:5 char:9
+     $doc = New-Object HtmlAgilityPack.HtmlDocument
+            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidType: (:) [New-Object], PSArgumentExcepti
   on
    + FullyQualifiedErrorId : TypeNotFound,Microsoft.PowerShell.Commands.NewOb
   jectCommand
0
 
LVL 8

Author Comment

by:Leo Torres
ID: 40471377
0
 
LVL 8

Author Comment

by:Leo Torres
ID: 40471777
Neglected? Why Moderator?
0
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

 
LVL 10

Expert Comment

by:Joe Klimis
ID: 40472881
Hi leo


I use  split to divide up the text  as the basic html does not usually change that often

Here is an example

$Website = "http://scores.espn.go.com/nhl/scoreboard?date=20141125"
$Request = Invoke-WebRequest -URI $webSite
$h = $request.ParsedHtml.getElementsByTagName("div")
$h | where classname -eq 'team-name' | select InnerText
$a = $h | where classname -eq 'span-2' | select innerhtml
$teama = ($a.innerHTML -split "</A>")[0].split(">")[11]
$scorea =  ($a.innerHTML -split "</A>")[1].split("<")[4].split(">")[1]
$teamb = (($a.innerHTML -split "</A>")[1] -split ">")[17]
$scoreb = ($a.innerHTML -split "</A>")[2].split(">")[4].split("<")

write-output $teama , $scorea , $teamb , $scoreb

Open in new window


Let me know if you require a specific example or further help.

Joe
0
 
LVL 69

Expert Comment

by:Qlemo
ID: 40472913
Oh, that rings a bell. August 2013, College Football results, then NFL results ... A pity each league has a different page layout on ESPN, and it is never as straightforward as it should.
0
 
LVL 69

Expert Comment

by:Qlemo
ID: 40472954
As it seems, one of the reliable ways is to get each game's HTML ID, extract the unique ID number (e.g. 340415104), and use that in an XPath like //*[@id="340415104-aTeamName"]/a (for "away" team name).
0
 
LVL 8

Author Comment

by:Leo Torres
ID: 40472991
Qlemo: as a programmer I realized this page matches teams to an ID at the beginning then references those IDs to represent team. So your suggestion crossed my mind but I had no idea how to do it. Remember I need to get results for day passed.

Thanks again for your help on this.
0
 
LVL 10

Expert Comment

by:Joe Klimis
ID: 40475648
Hi Leo

is this similar to the other ticket you raised ?    would you like some further help ?

Regards
Joe
0
 
LVL 69

Accepted Solution

by:
Qlemo earned 500 total points
ID: 40481096
Found a non-programmer solution. Not as elegant as I would like it to be, but it is more "generic" and allows easy adaption to other scenarios
Add-Type -path C:\temp\HtmlAgilityPack\Net40\htmlagilitypack.dll
CLS

$Website = "http://scores.espn.go.com/mlb/scoreboard?date=20140415"
$wc = New-Object System.Net.WebClient;
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml($wc.DownloadString($Website))

$games = $doc.DocumentNode.SelectNodes('//*[@class="team-name"]|//*[@class="score"]/*[@class="finalScore"]') | select -Expand InnerText

while ($games)
{
  $Team1, $Score1, $Team2, $Score2, $games = $games
  Write-Host $Team1 $Score1', '$Team2 $Score2
}

Open in new window

That strange assignment in line 13 assigns the first 4 array elements to vars, and keeps the remainder. The array gets shorter by 4 elements this way each time. This procedure is necessary since the XPath expression returns 4 successive values per game. We could have been more specific, e.g. have an array with only "Team Away" names, another with "Team Away" scores aso. but that doesn't make things easier or more clear.
0
 
LVL 8

Author Closing Comment

by:Leo Torres
ID: 40528867
Sorry Qlemo.. been in vacation mode for past few weeks.
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
A recent project that involved parsing Tableau Desktop and Server log files to extract reusable user queries for use in other systems. I chose to use PowerShell to gather the data, and SharePoint to present it...
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the adminiā€¦

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question