Parse data from website

edrz01
edrz01 used Ask the Experts™
on
I am a PowerShell newbie.... I have been struggling on how to get a date value from a website.

I need to get a date from a certain file on a webpage. The webpage is in index format so looks lie this:

8/23/2014 12:00 AM     422508   file120.abc
8/23/2014 12:00 AM     440964   file121.abc
8/23/2014 12:00 AM     332636   file122.abc
...
11/26/2017 2:30 PM     3823     thefile.ini
...
11/25/2017 1:01 AM     88044    file309.abc

I need to find the line that contains 'thefile.ini' and get the date (11/26/2017) from it

When I look at the view source on the page I see

<br>11/26/2017  2:30 PM         3823 <A HREF="/folder/thefile.ini">thefile.ini</A>
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
Can we have an example of the webpage ?, (I mean something like the web saved into an HTML or something) because the question is to vague to be answered.

Author

Commented:
Thanks Jose. Untitled.png
So I am trying to get the date for this particular file...
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
No problem :) , is that an FTP view edrz01 ?
Fundamentals of JavaScript

Learn the fundamentals of the popular programming language JavaScript so that you can explore the realm of web development.

Author

Commented:
No, it is a web listing. The URL is something like  (can't paste the real link here)

http://mcafee.xxx.xxxx.xxx/css_content/current/VSCANDAT1000/DAT/0000/
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
Well let's start for something!

Source: https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-5.1

$R = Invoke-WebRequest -URI http://putTheRealUrlHere.com
then do
$R.AllElements | gm
and post it here.

or
$R.AllElements | where {$_.innerhtml -like "11/26/2017"}

It's hard to tell the exact query because i don't have the link. sorry.

Author

Commented:
First one:

PS C:\WINDOWS\system32> $R = Invoke-WebRequest -URI "http://mcafee.xxx.xxx.xxx/css_content/current/VSCANDAT1000/DAT/0000/"
 then do
 $R.AllElements | gm
then : The term 'then' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path
was included, verify that the path is correct and try again.
At line:2 char:2
+  then do
+  ~~~~
    + CategoryInfo          : ObjectNotFound: (then:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
 


   TypeName: System.Management.Automation.PSCustomObject

Name        MemberType   Definition                                                                                                                          
----        ----------   ----------                                                                                                                          
Equals      Method       bool Equals(System.Object obj)                                                                                                      
GetHashCode Method       int GetHashCode()                                                                                                                  
GetType     Method       type GetType()                                                                                                                      
ToString    Method       string ToString()                                                                                                                  
innerHTML   NoteProperty string innerHTML=<HEAD><TITLE>mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/</TITLE></HEAD>...                  
innerText   NoteProperty string innerText=mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/mcafee.xxx.xxxx.xxx - /css_content/current/VSCA...
outerHTML   NoteProperty string outerHTML=<HTML><HEAD><TITLE>mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/</TITLE></HEAD>...            
outerText   NoteProperty string outerText=mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/mcafee.xxx.xxxx.xxx - /css_content/current/VSCA...
tagName     NoteProperty string tagName=HTML
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
that works on powershell version 5.1
to get the version go to a ps console and write: $PSVersionTable

Author

Commented:
Name                           Value                                                                                                                        
----                           -----                                                                                                                        
PSVersion                      5.1.14393.1770                                                                                                                
PSEdition                      Desktop                                                                                                                      
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}                                                                                                      
BuildVersion                   10.0.14393.1770                                                                                                              
CLRVersion                     4.0.30319.42000                                                                                                              
WSManStackVersion              3.0                                                                                                                          
PSRemotingProtocolVersion      2.3                                                                                                                          
SerializationVersion           1.1.0.1

Author

Commented:
If I delete the 'then do' it displays the information (if this is what you are expecting)

PS C:\WINDOWS\system32> $R = Invoke-WebRequest -URI "http://mcafee.xxx.xxxx.xxx/css_content/current/VSCANDAT1000/DAT/0000/"
 $R.AllElements | gm


   TypeName: System.Management.Automation.PSCustomObject

Name        MemberType   Definition                                                                                                                          
----        ----------   ----------                                                                                                                          
Equals      Method       bool Equals(System.Object obj)                                                                                                      
GetHashCode Method       int GetHashCode()                                                                                                                  
GetType     Method       type GetType()                                                                                                                      
ToString    Method       string ToString()                                                                                                                  
innerHTML   NoteProperty string innerHTML=<HEAD><TITLE>mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/</TITLE></HEAD>...                  
innerText   NoteProperty string innerText=mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/mcafee.xxx.xxxx.xxx - /css_content/current/VSCA...
outerHTML   NoteProperty string outerHTML=<HTML><HEAD><TITLE>mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/</TITLE></HEAD>...            
outerText   NoteProperty string outerText=mcafee.xxx.xxxx.xxx - /css_content/current/VSCANDAT1000/DAT/0000/mcafee.xxx.xxxx.xxx - /css_content/current/VSCA...
tagName     NoteProperty string tagName=HTML

Author

Commented:
What's frustrating about this is when I view source the date field I need is before the actual file. Making it harder....Untitled1.png
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
Well it's a web view.. there's nothing u can do about it.

Ok run this:

$R.AllElements | where {$_.innerhtml -like "11/26/2017"}

Author

Commented:
Looks like nothing returned:

PS C:\WINDOWS\system32>
$R = Invoke-WebRequest -URI "http://mcafee.xxx.xxxx.xxx/css_content/current/VSCANDAT1000/DAT/0000/"
$R.AllElements | where {$_.innerhtml -like "11/26/2017"}

PS C:\WINDOWS\system32>
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
haha Just send me the actual link in a pm.

Author

Commented:
Sent you a PM - can't send link, sorry
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
Well, the question can't be answered because of the lack of details, you can keep exploring the innerhtml and look for the properties you are looking for, or maybe if you can get the information in CSV or XML that would be helpful to your proposes.

jose

Author

Commented:
Jose, I appreciate you trying to find a solution however not sure what you mean by 'lack of details'. I provided you everything I could.

The URL I am accessing looks like this (can't provide actual link since it is restricted)

Untitled1.png
On that listing is a filename called 'avvdat.ini'

I am trying to get the date to the left of that filename.

Untitled2.png
When I hit F12 on the webpage it shows the data as below. I put blocks around the date and the file.

Untitled3.png
Jose Gabriel Ortega CastroTop Rated Freelancer on MS Technologies
Awarded 2018
Distinguished Expert 2018

Commented:
I don't think that any can give you an answer with the information you have provided. What I suggest you here, it's to save the HTML file in your desktop and remove the information that is not relevant to the question, (like enterprise, number of McAfee account or whatever). and left the HTML structure intact so we can finish the answer. What I mean with incomplete or it can't be answered is that we're not wizards. We need to have the full HTML structure so we can provide an answer or do the query accordingly to your SPECIFIC NEEDS. each web page is different.
Most Valuable Expert 2018
Distinguished Expert 2018
Commented:
Adjust the URL in the last line, save it as Whatever.ps1, and run it.
If it gives you the date, remove the "-Debug" switch from the last line.
If it does not give you the date, open the file "McAfeeContent.html" that should be in the current folder, replace any sensitive information with placeholders, and upload it here (no screenshot - either as file attachment or inside a [code][/code] block!).

The function will parse the complete content into custom PS objects which you can either filter yourself, or you can directly pass it a filter like in the code below, so you can get information about any file listed.
The Date property returned will be a full DateTime object, not just a string.
Function Get-McAfeeItem {
Param(
	[String]$Url,
	[String]$Filter,
	[Switch]$Debug
)
	$NameFilter = If ($Filter) {{$_.Name -like $Filter}} Else {{$true}}
	$DTProvider = New-Object -TypeName System.Globalization.CultureInfo -ArgumentList 'en-US'
	$DTFormat = 'M/d/yyyy h:mm tt'
	$Content = Invoke-WebRequest -Uri $Url | Select-Object -ExpandProperty Content
	If ($Debug) {$Content | Set-Content -Path "$((Get-Location -PSProvider FileSystem).Path)\McAfeeContent.html"}
	$Content.Replace("`r`n", ' ') -replace '\s+', ' ' -split '<br>' |
		Where-Object {$_ -match '\s*(?<Date>.*?)\s*(?<Size>\d+?)\s*<A\s+HREF\s*=\s*"(?<Path>.*?)"\s*>\s*(?<Name>.*?)</A>'} |
		Select-Object -Property `
			@{n='Name'; e={$Matches['Name']}},
			@{n='Size'; e={[int64]$Matches['Size']}},
			@{n='Date'; e={[DateTime]::ParseExact($Matches['Date'], $DTFormat, $DTProvider)}},
			@{n='Path'; e={$Matches['Path']}} |
		Where-Object $NameFilter
}

Get-McAfeeItem -Debug -Url "mcafee.acme.com/css_content/current/VSCANDAT1000/DAT/0000/" -Filter avvdat.ini | Select-Object -ExpandProperty Date

Open in new window

Author

Commented:
OBDA - worked like a champ! Thank you!!!!!!

Author

Commented:
Worked like a champ! Thanks!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial