Link to home
Start Free TrialLog in
Avatar of Richard Detsch
Richard Detsch

asked on

Using Excel Macro to extract specific text from website based on li id

I'm trying to write Excel macro that extracts specific text from a website based on the html structure.

Here is part of the website's html

                       <div id="footer" role="contentinfo">
            <ul id="footer-info">
                             <li id="footer-info-lastmod"> This page was last modified on 28 July 2014, at 19:44.</li>

I only need the text associated with li id "footer-info-lastmod".
Avatar of LajuanTaylor
LajuanTaylor

Take a look at the fully functional 30 day trial of WebReplay. It's an IE Plugin and it's pretty straight forward to use.
http://www.iewatch.com/downloadwr.aspx

I use this product daily with a SaaS provider that doesn't have an API for me to export my data. The data I get daily is Excel. I simulate logging into the website to get what I need...

The tool handles sites that use AJAX and sessions variables... You can screen scrape elements and load them into Excel if necessary.
Avatar of Richard Detsch

ASKER

Lijuan,
Thanks for product reference, but I was really looking for a solution in Excel visual basic code.
No problem.

So I assume you have a button in your Excel document that you will push to trigger the Excel VBA that will perform an http request to a remote site?

The returned html would then be loaded into a variable, parsed, and the specific piece of text extracted... Is that what you are looking to do?
It is.
Okay, you will need to make a VBA HTTP request from within Excel first:
https://www.google.com/?gws_rd=ssl#q=vba+http+get+request+excel

Sub http()
Dim MyRequest As Object
    Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
    MyRequest.Open "GET", _
    "http://www.google.com"
   
 ' Send Request.
    MyRequest.Send

    'And we get this response
    MsgBox MyRequest.ResponseText
End Sub
Your request should be "get" because you are not posting anything to a remote site. You will have to manipulate the returned html to extract what you.

I find that writing the data to disk first works better than trying to parse and extract in-memory... You will have to experiment.
That's working to get entire text from open websites.
The ones I scraping are behind my firewall and need a CAC login.

I was playing around before and was able to get those to open with excel, but had to type in my CAC number.

This code does allow me to enter my CAC number and get to the site:
           'from https://www.youtube.com/watch?v=7sZRcaaAVbg

           Dim IE As New InternetExplorer
           IE.Visible = True

           IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"

             Do
                DoEvents
                 Loop Until IE.ReadyState = READYSTATE_COMPLETE
For security reasons you might have to password protect the VB module and use a service account...

The following has not been tested, but this can be the basis for your login form input:

Dim IE As Object

Sub submitLoginInfo()
    Application.ScreenUpdating = False

    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = True
    IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"

    Application.StatusBar = "Submitting"
    ' Wait while IE loading...
    While IE.Busy
        DoEvents
    Wend
    ' **********************************************************************
    ' You can adjust the delay to suite your needs...
    delay 5
    IE.Document.getElementById("YourUserName_FormIdHere").Value = "username"
    delay 5
    IE.Document.getElementById("YourPassword_FormIdHere").Value = "password"
    delay 5
    IE.Document.getElementById("YourSubmitButton_FormIdHere").Click
    '**********************************************************************
    Application.StatusBar = "Form Submitted"
    IE.Quit
    Set IE = Nothing

    Application.ScreenUpdating = True
End Sub

' **********************************************************************
' Delay Subroutine
' **********************************************************************
Private Sub delay(seconds As Long)
    Dim endTime As Date
    endTime = DateAdd("s", seconds, Now())
    Do While Now() < endTime
        DoEvents
    Loop
End Sub
your sub submitLoginInfo works for opening page. But how do I get the source text into excel?
ASKER CERTIFIED SOLUTION
Avatar of LajuanTaylor
LajuanTaylor

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I tested the sample and it works for getting text from a web site and loading into Excel... I just created a button on the worksheet and assigned a VB macro using the code from the example.
Great how fast the reply was, and how the progammer went back and forth on the issue until it was resolved.
Thank You.