Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Excel vba to retrieve mutiple data from web site

Posted on 2010-08-29
24
Medium Priority
?
784 Views
Last Modified: 2013-02-05
I am looking for vba coding to add to attached Excel file with vba. I want to be able to click the "Get Date" button and take the symbols beginning in column A row 5 of the spreadsheet and go to web site to find associated dates. Then I want vba to retrieve the date and place it next to symbol in spreadsheet. I am attaching the Excel file for use to add the necessary code. I used IE8 and searched for the date, 23-NOV-10, and found it but do not know how to write the vba code to retrieve it. I want the vba code go down the column and retrieve the date for each symbol shown. The number of symbols in column "A" can be upwards or 300 plus.

Once in a while I get and error, but I think it is because I close the web site before the coding completes its iterations.
 
File is Excel 2007

Thanks in advance for your help!
 Test-Retrieval.xlsm
0
Comment
Question by:sumboddie
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 11
  • 2
24 Comments
 
LVL 34

Expert Comment

by:Norie
ID: 33557531
What data do you want to get for each ticker?

By the way is there a page where you can enter the symbol and then
initiate the search via a button or something?

That might be easier to work with rather than a separate URL for each syymbol.

ie navigate to search page, insert symbol, do search, get results.
0
 
LVL 17

Expert Comment

by:calacuccia
ID: 33557898
Hi

90        Do While ie.busy Or Not ie.ReadyState = IE_READYSTATE.Complete: DoEvents: Loop

        htmlStr = ie.document.body.innerHTML
     
        pointer = InStr(1, htmlStr, "Future Events")
        pointer = InStr(pointer, htmlStr, "vAlign")
        htmlStr = Mid(htmlStr, pointer + 18, 9)
        Range("B5").Value = htmlStr

100       Set doc = ie.document

The additional lines between 90 & 100 passed the test on your first sample (A5), did not test it further, but I think it should work for all.
Sub Test()
          
          Dim ie As Object
          Dim doc As HTMLDocument
          Dim PageForm As HTMLFormElement
          Dim FormButton As HTMLInputButtonElement
          Dim Elem As IHTMLElement
          Dim xURL As String
          Dim xSym As String
          Dim htmlStr As String
          Dim pointer As Long, spointer As Long
         
          
10        Sheets("Test").Activate
20        xURL = Range("A2").Value
30        xSym = Range("A6").Value
          
40        xURL = xURL + xSym
          
50       On Error GoTo Test_Error

60        Set ie = CreateObject("InternetExplorer.Application")
          
70        ie.Visible = True
80        ie.navigate xURL
          
          'Wait for initial page to load
          
90        Do While ie.busy Or Not ie.ReadyState = IE_READYSTATE.Complete: DoEvents: Loop

        htmlStr = ie.document.body.innerHTML
     
        pointer = InStr(1, htmlStr, "Future Events")
        pointer = InStr(pointer, htmlStr, "vAlign")
        htmlStr = Mid(htmlStr, pointer + 18, 9)
        Range("B6").Value = htmlStr

100       Set doc = ie.document
          
110      On Error GoTo 0
120      Exit Sub

Test_Error:

130       MsgBox "Error " & Err.Number & " (" & Err.Description & ") at line # " & Erl & " in procedure Test of VBA Document Sheet5"
         
End Sub

Open in new window

0
 
LVL 17

Expert Comment

by:calacuccia
ID: 33557917
By the way, I just see imnorie's comment, of course, you should keep your same instance throughout the record, but I figured that you were well aware of that and your main problem right now was to retrieve the data you need.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 34

Expert Comment

by:Norie
ID: 33558274
calacuccia

I wasn't specifically thinking about the multiple instances thing, I think anyway.

Just wondering if there was a search page/form, and there appears to be.

I also wanted to know what data the OP wanted to get.

There seems to be a lot more than the Future Events date.

By the way all the results seem to be in a div with the ID 'ResultsDiv'.

It has 2 tables, one for future events and one a sort of archive.

Update - can't seem to find out how to get the search to work.

Anyway here's the code I tried, which doesn't error but doesn't return the required results.
'Need reference to Microsoft HTML Object Library.  Select this in Tools - References in VB editor.

Option Explicit

Public Enum IE_READYSTATE
     Uninitialised = 0
     Loading = 1
     Loaded = 2
     Interactive = 3
     Complete = 4
 End Enum

Sub Test()

Dim ie As Object

Dim doc As HTMLDocument
Dim divRslts As HTMLDivElement
Dim PageForm As HTMLFormElement
Dim valSearch As HTMLInputElement
Dim Elem As IHTMLElement
Dim xURL As String



  

    xURL = Sheets("Test").Range("A2").Value

    Set ie = CreateObject("InternetExplorer.Application")

    ie.Visible = True
    ie.navigate xURL

    'Wait for initial page to load

    Do While ie.busy Or Not ie.ReadyState = IE_READYSTATE.Complete: DoEvents: Loop

    Set doc = ie.document
    
    Set PageForm = doc.forms("form1")
    
    Set valSearch = doc.getElementById("SearchTicker")
    
    valSearch.Value = Sheets("Test").Range("A5").Value
    
    PageForm.submit
    
    Do While ie.busy Or Not ie.ReadyState = IE_READYSTATE.Complete: DoEvents: Loop
    
    Set divRslts = doc.getElementById("ResultsDiv")
    
    GetAllTables divRslts.document
    
    ie.Quit

End Sub

Sub GetAllTables(d)
Dim ws As Worksheet
Dim rng As Range
Dim t As HTMLTable
Dim r As HTMLTableRow
Dim c As HTMLTableCell
Dim tabno As Long
Dim nextrow As Long
Dim I As Long

    Set ws = Worksheets.Add

    For Each t In d.getElementsByTagName("TABLE")

        tabno = tabno + 1
        nextrow = nextrow + 1
        Set rng = ws.Range("B" & nextrow)
        rng.Offset(, -1) = "Table " & tabno
        For Each r In t.Rows
            For Each c In r.Cells
                rng.Value = c.outerText
                Set rng = rng.Offset(, 1)
                I = I + 1
            Next c
            nextrow = nextrow + 1
            Set rng = rng.Offset(1, -I)
            I = 0
        Next r

    Next t

    ws.Cells.ClearFormats
End Sub

Open in new window

0
 

Author Comment

by:sumboddie
ID: 33559161
Hi Calacuccia,
Your code is on track. We are able to return the date for the second symbol in column "A". And I agree with imnorie, and was thinking that after I posted my question, that we want to keep the same instance open to retrieve all other data.

So can we modify the code to always start at the first symbol and progress down column "A" to the last symbol? For the purpose of this example we are using three symbols, but typical usage will involve about 300 or so symbols. I will probably schedule this to run late at night so it can have the time it needs to update all dates in the list.

BTW your coding insight gives me confidence in my perception of what code was needed, and how and where in the HTML it would retrieve the data in which I am interested.
0
 
LVL 34

Expert Comment

by:Norie
ID: 33559663
sumboddie

The following code will extract the data for each ticker to a new worksheet.

It extracts all the data in the results table(s) if you really only want the future events that shouldn't
be too difficult.

As well as code for getting data from all the tables I can post code that can get it from specific tables.

Parsing the HTML is one method for this sort of thing and it definitely works, but if you are automating
IE to navigate to a web page etc then you can access all the elements of the page via the DOM object
model.

The document object is at a pretty high level in that model but below it you have all the other elements on
the page, eg tables, forms, inputs etc.


'Need reference to Microsoft HTML Object Library.  Select this in Tools - References in VB editor.

Option Explicit

Public Enum IE_READYSTATE
     Uninitialised = 0
     Loading = 1
     Loaded = 2
     Interactive = 3
     Complete = 4
 End Enum

Sub Test()

Dim ie As Object

Dim doc As HTMLDocument
Dim divRslts As HTMLDivElement
Dim PageForm As HTMLFormElement
Dim valSearch As HTMLInputElement
Dim Elem As IHTMLElement
Dim xURL As String
Dim rngTicker As Range

    Set rngTicker = Sheets("Test").Range("A5")

    xURL = Sheets("Test").Range("A2").Value

    Set ie = CreateObject("InternetExplorer.Application")

    ie.Visible = True

    While rngTicker <> ""

        ie.navigate xURL & rngTicker

        'Wait for initial page to load

        Do While ie.busy Or Not ie.ReadyState = IE_READYSTATE.Complete: DoEvents: Loop

        Set doc = ie.document

        Set divRslts = doc.getElementById("ResultsDiv")

        GetAllTables divRslts.document
        Set rngTicker = rngTicker.Offset(1)
        
    Wend
    
    ie.Quit

End Sub

Sub GetAllTables(d)
Dim ws As Worksheet
Dim rng As Range
Dim t As HTMLTable
Dim r As HTMLTableRow
Dim c As HTMLTableCell
Dim tabno As Long
Dim nextrow As Long
Dim I As Long

    Set ws = Worksheets.Add

    For Each t In d.getElementsByTagName("TABLE")

        tabno = tabno + 1
        nextrow = nextrow + 1
        Set rng = ws.Range("B" & nextrow)
        rng.Offset(, -1) = "Table " & tabno
        For Each r In t.Rows
            For Each c In r.Cells
                rng.Value = c.outerText
                Set rng = rng.Offset(, 1)
                I = I + 1
            Next c
            nextrow = nextrow + 1
            Set rng = rng.Offset(1, -I)
            I = 0
        Next r

    Next t

    ws.Cells.ClearFormats
    
End Sub

Open in new window

0
 

Author Comment

by:sumboddie
ID: 33560961
Hi imnorie,
You present a very interesting concept and your code provides more than I am looking for so maybe with some modification I will have what I want and more. Let's take the data retrieved from Tables 5 and 6 only.

With the symbol in Column A, put the data from Table 5 in columns B, C, and D.

With the symbol in Column A, put the data for the first record from Table 6 in columns E, F, G; the data for the second record from Table 6 in columns H, I, and J; the data for the third record from Table 6 in columns K, L, and M; and continue this scheme through the 10th record from Table 6. I want all data to be contained in the main spreadsheet and not add a new spreadsheet for each symbol.

Your solution goes above and beyond my expectations and I will award more points to accomplish the scheme described herein.
Many Thanks!!

0
 
LVL 34

Expert Comment

by:Norie
ID: 33561080
Put what data in columns B, C and D?

If you are going to be so specific about what data you want then tell us what it is.

In your original file all you seemed to want is a date.

There was no mention of any other data, and definitely nothing about the archive stuff.

Like I said I've got code that can extract specific tables.

Obviously when working with Excel the data goes to a worksheet, and in some cases a new worksheet is the best place to put.

Also are you sure you want 90 columns of archived data?

It can be done but the structure of the data wouldn't make it very conducive to any further analysis, filtering etc.
0
 

Author Comment

by:sumboddie
ID: 33561231
Hi imnorie,
I apologize for not being clear in which data I want in the expanded request. Sometimes we are so deep in the woods and know what we are looking for that we forget others may only be able to see the forest :-).

From Table 5 I would like the Date, Page, and Event data only.

From Table 6 I would like the Date, Page, and Headline data only.

As I said in my previous post, you presented an interesting concept by looking at the table contents rather than just the date piece of data. Now, I do want more than the date mentioned in my original post.
I do not want the data posted in a new worksheet, only added to the existing worksheet. And yes, 90 columns of data is OK. Once I have the code I anticipate you will provide, I will be able to take it from there to manipulate the data using vba as I need.

Thanks for allowing me to clarify my needs!
0
 
LVL 34

Expert Comment

by:Norie
ID: 33561440
Well I'll post code for maybe 1 or tickers and you can take it from there, but I still don't think arranging the
data across columns like you want is the best idea.

What sort of manipulation would you be doing with the data?
0
 

Author Comment

by:sumboddie
ID: 33561786
I'm thinking that if we use a counter to step through the rows for each symbol that is in column A, I can vary that list at will. I have a way to easily retrieve the list from another source that will fill column A to the depth necessary (i.e., 300 plus or minus). I was thinking of using code like:

      Sheets("Test").Select
                     Endrow = Range("A65536").End(xlUp).Row
      For counter = 5 To Endrow
                     xSym = Cells(counter, 1).Value

Then step through the rows retrieving data to fill in the columns as describe in my earlier posts.

I plan to manipulate data by prioritizing it in a sort that looks for recent dates and/or looking for certain key words. I will probably export/import the completed spreadsheet into MS Access for storage and further analysis. There would be a benefit to be able to retrieve a narrow data range from Table 6 based on a date window of say a week. The data will used elsewhere so the raw data retrieved is what is important and not how it is presented in the spreadsheet.

Hope this helps!!
0
 
LVL 34

Accepted Solution

by:
Norie earned 1500 total points
ID: 33562022
Well I'm not sure about your ideas about the data.

How do you propose to sort data across columns?

Also, how will you search across columns?

Neither of these are easy to do in Excel, or Access for that matter.

The best format for the data would be a table with 4 main fields.

If I was doing it in Access the table might have the following:

TickerID
fldTicker
fldDate
fldPage
fldEventHeadLine

Anyway I digress, here's code that works for me and I think it does what you want.

It isn't perfect - I did use a spare worksheet for the initial data extract and the dates might need formatted.

Both of which can easily be taken care of.
Option Explicit

'Need reference to Microsoft HTML Object Library.  Select this in Tools - References in VB editor.
Public Enum IE_READYSTATE
     Uninitialised = 0
     Loading = 1
     Loaded = 2
     Interactive = 3
     Complete = 4
 End Enum

Sub Test()

Dim ie As Object

Dim doc As HTMLDocument
Dim divRslts As HTMLDivElement
Dim PageForm As HTMLFormElement
Dim valSearch As HTMLInputElement
Dim Elem As IHTMLElement
Dim rngRslts As Range
Dim rngTicker As Range

Dim I As Long
Dim xURL As String

    Set rngTicker = Sheets("Test").Range("A5")

    xURL = Sheets("Test").Range("A2").Value

    Set ie = CreateObject("InternetExplorer.Application")

    ie.Visible = True

    While rngTicker <> ""

        ie.navigate xURL & rngTicker

        'Wait for initial page to load

        Do While ie.busy Or Not ie.ReadyState = IE_READYSTATE.Complete: DoEvents: Loop

        Set doc = ie.document

        Set divRslts = doc.getElementById("ResultsDiv")

        'GetAllTables divRslts.document

        GetOneTable divRslts.document, 4, Worksheets("Data").Range("A1")

        GetOneTable divRslts.document, 5, Worksheets("Data").Range("F1")

        Set rngRslts = rngTicker.Offset(, 1)
        
        Worksheets("Data").Range("A2:C2").Copy rngRslts

        For I = 1 To 10

            Worksheets("Data").Range("F1:H1").Offset(I).Copy rngRslts.Offset(, 4 + (I * 3))
            
        Next I

        Set rngTicker = rngTicker.Offset(1)

    Wend

    ie.Quit

End Sub


Sub GetOneTable(d As HTMLDocument, n As Long, rng As Range)
' d is the document
' n is the table to extract
Dim t As Object    ' the table required
Dim colTbls As Object
Dim r As Object    ' the rows of the table
Dim c As Object    ' the cells of the rows.
Dim I As Long


    Set colTbls = d.getElementsByTagName("TABLE")

    Set t = colTbls(n)

    For Each r In t.Rows
    
        For Each c In r.Cells
            rng.Value = c.innerText
            Set rng = rng.Offset(, 1)
        Next c
        
        Set rng = rng.Offset(1, -r.Cells.Length)

    Next r

End Sub

Open in new window

0
 

Author Comment

by:sumboddie
ID: 33570797
Hi imnorie,
Your coding works as advertised!
==============================
However, after considering your suggestion of the table with fields:
Table 5: FUTURE EVENTS
TickerID
fldTicker
fldDate
fldPage
fldEvent

would work well in Access using data from Table 5.
===============================
Then I would use Table 6 in Access as another table with fields:
Table 6: ARCHIVED EVENTS
TickerID
fldTicker
fldDate
fldPage
fldHeadLine
================================
Please modify coding to provide the above tables. Furthermore, we need to leave cells in the spreadsheet null (blank) under the following conditions. I attached printouts of the four circumstances we need to evaluate to decide what data to retrieve, if any. Please see the attached printouts for visual aids for the following circumstances:

MHS Correct Results - Retrieve the selected data from Table 5 and 6.

RIMM Calls and Earnings - Retrieve only the earnings data and not the call data from Table 5, and the selected data from Table 6

FBP No Earnings but Archives - Leave the Table 5 data blank and retrieve only the selected Archive Table 6 data.

BBL No Results Found - Leave both the Table 5 and Table 6 data blank. The current coding appears to drop down to the table at the bottom of the page and retrieves the table with the title MARKETPLACE. This is advertising and not wanted in the retrieved data.

Thanks!

MHS-Correct-Results.pdf
RIMM-Calls-and-Earnings.pdf
FBP-No-Earnings-but-Archives.pdf
BBL-No-Results-Found.pdf
0
 
LVL 34

Expert Comment

by:Norie
ID: 33571310
Excuse me, but you do realise that most people that contribute to this board do so free of charge?

If you want such a detailed solution it might be an idea to hire somebody.

Please don't post with things like 'modify coding to provide the above tables'.

You've already said that what I posted earlier was above and beyond what you wanted.

So, I changed it - mainly because it only took a few lines of code.

You now want this changed to provide you with tables, with specific data excluded... etc and now you've also mentioned Access.

I'm sorry but that's taking things a bit too far, what next  'the moon on a stick'?
0
 

Author Comment

by:sumboddie
ID: 33571487
OK, I'm not looking for Access tables, only gathering the pertinent data in Excel. I can then get it to Access. I was only concurring with your assessment of the form for the data to use in Access and acknowledge that is a good idea.

If excluding the erroneous information from the web tables that is picked up using the current coding is too much, I apologize for asking to modify the code to get only the data that is relevant.

I accept your partial solution and award the points to you! Thank you!
0
 

Author Closing Comment

by:sumboddie
ID: 33571535
Expert was quick to reply and offered several good alternatives that I will try to follow-up. The basic vba solution for Excel was timely and accurate for what I asked. Maybe I can ask a follow-on question to resolve the need for decision making code that will omit the blank or non-relevent information and offer additional points.
0
 
LVL 34

Expert Comment

by:Norie
ID: 33571647
I didn't say that anything you asked for wasn't possible or would be too difficult.

You seem to have changed your original requirement quite a bit.

If you were to import the data to Access you could probably set up simple queries to exclude any unwanted data, group data, count records, query by date etc...

As for the code being 'basic vba' - not too sure what you mean there.

Sure it's straightforward, easy to follow, easy to adapt etc, which is what code should be like.

I'm happy to help, and I might take a look at your lastest request and attachments when I get a moment.

One thought I actually had was that you could get the all the data for all the symbols all in the one table or perhaps 2 if you want future/archived separated.

Each record in the table(s) would have the fields already mentioned plus a field for the symbols to identify which one the data belongs to.

With the data structured like that it would be much easier to analyse, manipulate, report etc.
0
 

Author Comment

by:sumboddie
ID: 33576204
I agree, I did change my original requirement quite a bit because you opened my eyes to see the bigger picture instead of focusing on one piece of data that was important at the time.

Yes, I can move the data from Excel to Access and use query(ies) to select and refine the data to organize and display the data I want. I have built fairly complex Access databases with vba but I am learning Excel vba and its interaction with web pages at this time.

By basic vba I meant the first coding solution you provided what I asked for and beyond that you provided more comprehensive coding to add the enhancements I asked for.

I like the idea of getting all the data for each of the two tables into the fields identified and then I will get it into Access and match up with the specific symbols I want using queries. I agree with you that it will be easier to manipulate and report. As a matter of fact, I am building an Access application to pull all this together and using Excel as a data transport vehicle rather than a standalone component of my system makes a lot of sense.

Thanks for your follow-up!
0
 
LVL 34

Expert Comment

by:Norie
ID: 33576726
There's actually no reason to use Excel as a 'transport vehicle' for the data if you think about it.

The important part, if you like, of the code is getting the data from the webpage.

Then you can do what you want/need with it.

In this case the data is going to an Excel worksheet.

That could be an Access table.

It would perhaps involve a bit more code but if it can be done it would eliminate the need to use Excel.

I've never actually done it before - most people seem to want the data in Excel, but I might give it a try.

Not done too much with Access recently, so might be interesting.

PS You shouldn't need to match up the data to the symbols, that's the whole point of having a field for the symbol.

Mind you having a table with the list of symbols would be useful.
0
 

Author Comment

by:sumboddie
ID: 33577024
Hi imnorie,

Attached are two files, one Access and one Excel each with the initial list of symbols of interest. Both files are from Office Pro 2007.

Thanks!
Weekly-Up.xlsx
Test-DB.accdb
0
 
LVL 34

Expert Comment

by:Norie
ID: 33579035
I can only use the Excel file, my version of Access is a wee bit out of date, but that shouldn't be a problem - I can
easily import the data to Access if needed.

One question though - are the other columns of data relevant at all?

It's hard to tell especially since there are no headers/field names.
0
 

Author Comment

by:sumboddie
ID: 33579094
Not for the purpose of retrieving the desired data from the web. It is only supporting data. The data from the web is uniique to that symbol on the date the data is retrieved.
0
 
LVL 34

Expert Comment

by:Norie
ID: 33579257
So, and this is a stupid question, which column out of the 5 columns of data is the symbol?

I have no knowledge of the stock market/financial world, so I don't know it's column A - honest.:)
0
 

Author Comment

by:sumboddie
ID: 33579452
Sorry, I only put the headings in the Access file and not the Excel file. Column A has the symbols. FYI, we can educate each other ... the other columns from B to F are:
B =  Company Name
C = Day's Closing Price
D = Industry and Sub Industry of the Company
E = Exchange where symbol is listed
F = Data of Last Closing Price
 :-)
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SSL stands for “Secure Sockets Layer” and an SSL certificate is a critical component to keeping your website safe, secured, and compliant. Any ecommerce website must have an SSL certificate to ensure the safe handling of sensitive information like…
Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
This Micro Tutorial will demonstrate how to add subdomains to your content reports. This can be very importing in having a site with multiple subdomains.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question