Using Excel Macro to extract specific text from website based on li id

I'm trying to write Excel macro that extracts specific text from a website based on the html structure.

Here is part of the website's html

                       <div id="footer" role="contentinfo">
            <ul id="footer-info">
                             <li id="footer-info-lastmod"> This page was last modified on 28 July 2014, at 19:44.</li>

I only need the text associated with li id "footer-info-lastmod".
Richard DetschChief Knowledge OfficerAsked:
Who is Participating?
 
LajuanTaylorCommented:
Take a look at the following example. It uses the XMLHTTP object to get info from a remote site which, can  then be parsed using the (XML DOM).

The example is VB and it stores the extracted text is stored in Excel.
http://www.vba-and-excel.com/vba/internet/6-loading-information-from-the-internet-using-the-xmlhttp-object
0
 
LajuanTaylorCommented:
Take a look at the fully functional 30 day trial of WebReplay. It's an IE Plugin and it's pretty straight forward to use.
http://www.iewatch.com/downloadwr.aspx

I use this product daily with a SaaS provider that doesn't have an API for me to export my data. The data I get daily is Excel. I simulate logging into the website to get what I need...

The tool handles sites that use AJAX and sessions variables... You can screen scrape elements and load them into Excel if necessary.
0
 
Richard DetschChief Knowledge OfficerAuthor Commented:
Lijuan,
Thanks for product reference, but I was really looking for a solution in Excel visual basic code.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
LajuanTaylorCommented:
No problem.

So I assume you have a button in your Excel document that you will push to trigger the Excel VBA that will perform an http request to a remote site?

The returned html would then be loaded into a variable, parsed, and the specific piece of text extracted... Is that what you are looking to do?
0
 
Richard DetschChief Knowledge OfficerAuthor Commented:
It is.
0
 
LajuanTaylorCommented:
Okay, you will need to make a VBA HTTP request from within Excel first:
https://www.google.com/?gws_rd=ssl#q=vba+http+get+request+excel

Sub http()
Dim MyRequest As Object
    Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
    MyRequest.Open "GET", _
    "http://www.google.com"
   
 ' Send Request.
    MyRequest.Send

    'And we get this response
    MsgBox MyRequest.ResponseText
End Sub
0
 
LajuanTaylorCommented:
Your request should be "get" because you are not posting anything to a remote site. You will have to manipulate the returned html to extract what you.

I find that writing the data to disk first works better than trying to parse and extract in-memory... You will have to experiment.
0
 
Richard DetschChief Knowledge OfficerAuthor Commented:
That's working to get entire text from open websites.
The ones I scraping are behind my firewall and need a CAC login.

I was playing around before and was able to get those to open with excel, but had to type in my CAC number.

This code does allow me to enter my CAC number and get to the site:
           'from https://www.youtube.com/watch?v=7sZRcaaAVbg

           Dim IE As New InternetExplorer
           IE.Visible = True

           IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"

             Do
                DoEvents
                 Loop Until IE.ReadyState = READYSTATE_COMPLETE
0
 
LajuanTaylorCommented:
For security reasons you might have to password protect the VB module and use a service account...

The following has not been tested, but this can be the basis for your login form input:

Dim IE As Object

Sub submitLoginInfo()
    Application.ScreenUpdating = False

    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = True
    IE.Navigate "https://wiki.erdc.dren.mil/ERDCpedia"

    Application.StatusBar = "Submitting"
    ' Wait while IE loading...
    While IE.Busy
        DoEvents
    Wend
    ' **********************************************************************
    ' You can adjust the delay to suite your needs...
    delay 5
    IE.Document.getElementById("YourUserName_FormIdHere").Value = "username"
    delay 5
    IE.Document.getElementById("YourPassword_FormIdHere").Value = "password"
    delay 5
    IE.Document.getElementById("YourSubmitButton_FormIdHere").Click
    '**********************************************************************
    Application.StatusBar = "Form Submitted"
    IE.Quit
    Set IE = Nothing

    Application.ScreenUpdating = True
End Sub

' **********************************************************************
' Delay Subroutine
' **********************************************************************
Private Sub delay(seconds As Long)
    Dim endTime As Date
    endTime = DateAdd("s", seconds, Now())
    Do While Now() < endTime
        DoEvents
    Loop
End Sub
0
 
Richard DetschChief Knowledge OfficerAuthor Commented:
your sub submitLoginInfo works for opening page. But how do I get the source text into excel?
0
 
LajuanTaylorCommented:
I tested the sample and it works for getting text from a web site and loading into Excel... I just created a button on the worksheet and assigned a VB macro using the code from the example.
0
 
Richard DetschChief Knowledge OfficerAuthor Commented:
Great how fast the reply was, and how the progammer went back and forth on the issue until it was resolved.
0
 
LajuanTaylorCommented:
Thank You.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.