Richard Detsch
asked on
Using Excel Macro to extract specific text from website based on li id
I'm trying to write Excel macro that extracts specific text from a website based on the html structure.
Here is part of the website's html
<div id="footer" role="contentinfo">
<ul id="footer-info">
<li id="footer-info-lastmod"> This page was last modified on 28 July 2014, at 19:44.</li>
I only need the text associated with li id "footer-info-lastmod".
Here is part of the website's html
<div id="footer" role="contentinfo">
<ul id="footer-info">
<li id="footer-info-lastmod"> This page was last modified on 28 July 2014, at 19:44.</li>
I only need the text associated with li id "footer-info-lastmod".
ASKER
Lijuan,
Thanks for product reference, but I was really looking for a solution in Excel visual basic code.
Thanks for product reference, but I was really looking for a solution in Excel visual basic code.
No problem.
So I assume you have a button in your Excel document that you will push to trigger the Excel VBA that will perform an http request to a remote site?
The returned html would then be loaded into a variable, parsed, and the specific piece of text extracted... Is that what you are looking to do?
So I assume you have a button in your Excel document that you will push to trigger the Excel VBA that will perform an http request to a remote site?
The returned html would then be loaded into a variable, parsed, and the specific piece of text extracted... Is that what you are looking to do?
ASKER
It is.
Okay, you will need to make a VBA HTTP request from within Excel first:
https://www.google.com/?gws_rd=ssl#q=vba+http+get+request+excel
Sub http()
Dim MyRequest As Object
Set MyRequest = CreateObject("WinHttp.WinH ttpRequest .5.1")
MyRequest.Open "GET", _
"http://www.google.com"
' Send Request.
MyRequest.Send
'And we get this response
MsgBox MyRequest.ResponseText
End Sub
https://www.google.com/?gws_rd=ssl#q=vba+http+get+request+excel
Sub http()
Dim MyRequest As Object
Set MyRequest = CreateObject("WinHttp.WinH
MyRequest.Open "GET", _
"http://www.google.com"
' Send Request.
MyRequest.Send
'And we get this response
MsgBox MyRequest.ResponseText
End Sub
Your request should be "get" because you are not posting anything to a remote site. You will have to manipulate the returned html to extract what you.
I find that writing the data to disk first works better than trying to parse and extract in-memory... You will have to experiment.
I find that writing the data to disk first works better than trying to parse and extract in-memory... You will have to experiment.
ASKER
That's working to get entire text from open websites.
The ones I scraping are behind my firewall and need a CAC login.
I was playing around before and was able to get those to open with excel, but had to type in my CAC number.
This code does allow me to enter my CAC number and get to the site:
'from https://www.youtube.com/watch?v=7sZRcaaAVbg
Dim IE As New InternetExplorer
IE.Visible = True
IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
The ones I scraping are behind my firewall and need a CAC login.
I was playing around before and was able to get those to open with excel, but had to type in my CAC number.
This code does allow me to enter my CAC number and get to the site:
'from https://www.youtube.com/watch?v=7sZRcaaAVbg
Dim IE As New InternetExplorer
IE.Visible = True
IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
For security reasons you might have to password protect the VB module and use a service account...
The following has not been tested, but this can be the basis for your login form input:
Dim IE As Object
Sub submitLoginInfo()
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExpl orer.Appli cation")
IE.Visible = True
IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"
Application.StatusBar = "Submitting"
' Wait while IE loading...
While IE.Busy
DoEvents
Wend
' ************************** ********** ********** ********** ********** ****
' You can adjust the delay to suite your needs...
delay 5
IE.Document.getElementById ("YourUser Name_FormI dHere").Va lue = "username"
delay 5
IE.Document.getElementById ("YourPass word_FormI dHere").Va lue = "password"
delay 5
IE.Document.getElementById ("YourSubm itButton_F ormIdHere" ).Click
'************************* ********** ********** ********** ********** *****
Application.StatusBar = "Form Submitted"
IE.Quit
Set IE = Nothing
Application.ScreenUpdating = True
End Sub
' ************************** ********** ********** ********** ********** ****
' Delay Subroutine
' ************************** ********** ********** ********** ********** ****
Private Sub delay(seconds As Long)
Dim endTime As Date
endTime = DateAdd("s", seconds, Now())
Do While Now() < endTime
DoEvents
Loop
End Sub
The following has not been tested, but this can be the basis for your login form input:
Dim IE As Object
Sub submitLoginInfo()
Application.ScreenUpdating
Set IE = CreateObject("InternetExpl
IE.Visible = True
IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"
Application.StatusBar = "Submitting"
' Wait while IE loading...
While IE.Busy
DoEvents
Wend
' **************************
' You can adjust the delay to suite your needs...
delay 5
IE.Document.getElementById
delay 5
IE.Document.getElementById
delay 5
IE.Document.getElementById
'*************************
Application.StatusBar = "Form Submitted"
IE.Quit
Set IE = Nothing
Application.ScreenUpdating
End Sub
' **************************
' Delay Subroutine
' **************************
Private Sub delay(seconds As Long)
Dim endTime As Date
endTime = DateAdd("s", seconds, Now())
Do While Now() < endTime
DoEvents
Loop
End Sub
ASKER
your sub submitLoginInfo works for opening page. But how do I get the source text into excel?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I tested the sample and it works for getting text from a web site and loading into Excel... I just created a button on the worksheet and assigned a VB macro using the code from the example.
ASKER
Great how fast the reply was, and how the progammer went back and forth on the issue until it was resolved.
Thank You.
http://www.iewatch.com/downloadwr.aspx
I use this product daily with a SaaS provider that doesn't have an API for me to export my data. The data I get daily is Excel. I simulate logging into the website to get what I need...
The tool handles sites that use AJAX and sessions variables... You can screen scrape elements and load them into Excel if necessary.