Solved

Excel VBA WEb Data Scraping from a Table

Posted on 2014-11-30
13
5,303 Views
Last Modified: 2014-12-02
I am trying to get the innertext of a table for my research project. The website is in the code and I am able to correctly enter the 'Transferring Account Number' e.g. 1234 (which I have hard coded in for now).
I can click the submit button fine and the data is populated at the bottom. But I am unable to access the data in the table.

I would like to be able to access individual cells in the table for example:

Transferring Account Identifier
Transferring Account Type
Acquiring Account Type

Capture.JPG
Alternatively, I would like to populate excel with all of the data from the results table - into "Sheet1"
My first hurdle is that I cannot find the table name or ID in the page source.
Help please
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object
  Dim accountno As String
  
  

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")
    

  With IE

  .Visible = True
  .navigate ("http://ec.europa.eu/environment/ets/transaction.do")

  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4
  DoEvents
  Wend

 'accessing the Transferring account number into the correct field
 IE.Document.All.Item("originatingAccountNumber").innertext = "1234"

        
  ' accessing the button
  
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  Do While IE.busy: DoEvents: Loop

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Range("A1:K500").ClearContents

  
 '???this is where I get stuck
  
  
  Set elemCollection = IE.Document.getElementsByTagname("????")
  
  
   
     For t = 0 To (elemCollection.Length - 1)
     
         For r = 0 To (elemCollection(t).Rows.Length - 1)
             For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innertext
             Next c
         Next r
     Next t
  
  End With

  ' cleaning up memory
  Set IE = Nothing
  
  End Sub

Open in new window

0
Comment
Question by:mikecvba
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
13 Comments
 
LVL 70

Expert Comment

by:Qlemo
ID: 40472628
You should get the table as starting point by ID:
Set elemCollection = IE.Document.getElementByID(("tblTransactionSearchResult")

Open in new window

But I'm not yet clear how to proceed.
0
 
LVL 70

Accepted Solution

by:
Qlemo earned 250 total points
ID: 40472681
This is the corrected and simplified VBA code. Remember to close IE before removing references - this is included now too.
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object, curHTMLRow As Object
  Dim accountno As String

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")

  IE.Visible = True
  IE.navigate ("http://ec.europa.eu/environment/ets/transaction.do")
  
  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4: DoEvents: Wend

  'accessing the Transferring account number into the correct field
  IE.Document.All.Item("originatingAccountNumber").InnerText = "1234"
        
  ' accessing the button
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  While IE.busy: DoEvents: Wend

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Activate
  Range("A1:K500").ClearContents
  
  Set elemCollection = IE.Document.getElementById("tblTransactionSearchResult")
  
  ' Rows(0) is Table title, so skipping
  For r = 1 To elemCollection.Rows.Length - 1
    Set curHTMLRow = elemCollection.Rows(r)
    For c = 0 To curHTMLRow.Cells.Length - 1
      Cells(r + 1, c + 1) = curHTMLRow.Cells(c).InnerText
    Next
  Next

  ' cleaning up memory
  IE.Quit
  Set IE = Nothing
  
End Sub

Open in new window

0
 

Author Comment

by:mikecvba
ID: 40473522
Olemo - its works brilliant and thanks for your excellent code. The only problem is that I have 20,000 (account numbers 1 to 20,000) enquiries to make and its very slow reloading the IE object and populating the innertext with the account number e.g. "1234" each time. Around 8 seconds per load. I think I should have done it with the XMLHTTP object, accessing the xml data directly otherwise it will take a month to populate..

I have written the XML connection code (below) and created the string argument to access the XML result but I don't know how to convert the result to something similar to the IE.document format that you used.

Dim XMLHTTP
    Dim result As String
    Dim argumentString
argumentString = "?export_url=destinationAccountHolder=&startDate=01%2F01%2F2008&destinationRegistry=-1&originatingAccountType=121&form=transaction&endDate=31%2F12%2F2011&transactionID=&originatingAccountHolder=&suppTransactionType=-1&transactionType=-1&languageCode=en&destinationAccountNumber=" & registryno & "&destinationAccountType=121&toCompletionDate=&originatingRegistry=-1&destinationAccountIdentifier=&fromCompletionDate=&originatingAccountIdentifier=&transactionStatus=4&originatingAccountNumber=&currentSortSettings=&form=transaction&exportType=1&OK=OK"
    
    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
    XMLHTTP.Open "POST", _
    "http://ec.europa.eu/environment/ets/export.do", False
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    XMLHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
    XMLHTTP.send argumentString
    result = XMLHTTP.responsetext
    result = Replace(result, "encoding=" & Chr(34) & "UTF-8", "encoding=" & Chr(34) & "iso-8859-1")

Open in new window



Can you please help me replace the header in your suggested code so that the result here flows and populates the rows and columns in the Excel sheet but with the XML object rather than the Internet Explorer populated page?
0
Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 250 total points
ID: 40473548
You could also use a querytable object to retrieve the web page's table.
Example:
Sub Q_28571716()
   Dim oQT As QueryTable
   
   'Create query table to hold the rates.
   With Worksheets("Sheet2")
       Set oQT = .QueryTables.Add( _
           Connection:="URL;http://ec.europa.eu/environment/ets/transaction.do?languageCode=en&startDate=&endDate=&transactionStatus=4&fromCompletionDate=&toCompletionDate=&transactionID=&transactionType=-1&suppTransactionType=-1&originatingRegistry=-1&destinationRegistry=-1&originatingAccountType=-1&destinationAccountType=-1&originatingAccountNumber=1234&destinationAccountNumber=&originatingAccountIdentifier=&destinationAccountIdentifier=&originatingAccountHolder=&destinationAccountHolder=&search=Search&currentSortSettings=", _
           Destination:=.Range("A1"))
   End With
   
   With oQT
     
        'Don't recognize dates.
        .WebDisableDateRecognition = False
        
        'Don't refresh query when file opened.
        .RefreshOnFileOpen = False
        
        'Ignore page formatting.
        .WebFormatting = xlWebFormattingNone
        
        'Wait for query to finish before continuing.
        .BackgroundQuery = False
        
        'Select a specific table.
        .WebSelectionType = xlSpecifiedTables
        
        'Import the table containing the exchange rates.
        .WebTables = "tblTransactionSearchResult"
        
        'Save the query with workbook.
        .SaveData = True
        
        'Adjust columns to fit the data.
        .AdjustColumnWidth = True
        
        .Refresh

    End With

End Sub

Open in new window



Reference:
http://msdn.microsoft.com/en-us/library/office/aa203721(v=office.11).aspx#odc_xlwebquery_retrievingdatafromasingletable
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 250 total points
ID: 40473551
You could tweak this routine, adding an account number parameter, and add that parameter value into the URL/URI string concatenation.
0
 

Author Closing Comment

by:mikecvba
ID: 40473614
Both experts provide 2 perfects answers to the same solution. aikimark  answered with a sleek alternative. Qlemo answered perfectly the original request
0
 
LVL 70

Expert Comment

by:Qlemo
ID: 40473668
Mark,
Interesting. Didn't know you can use the table ID in WebTables, I was always using an integer index.

mikecvba,
Is your follow-up question moot by using the parameter in the URL and/or Web Queries, or are you going to ask a new question about XML?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40473738
Didn't know you can use the table ID
Learn much, you will, my young padawan
0
 

Author Comment

by:mikecvba
ID: 40474401
Olemo, i would still like to do it with the XML object as both of the two methods above -  via the Internet Explorer, populating  the textbox and clicking search button &  via the tableobject are very slow. I have run out of points to award but i would appreciate an education in connecting the xmlhttp object to the online xml table and then populating the table/rows columns on the sheet. I have already started the xmlhttp code as above but i need to format the results into rows and columns -  exactly as you did in your first solution from the  resultant IE.Document.  I think future readers will find this  page informative to see all three working solutions together.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40474440
I am working on a few new articles on that subject.  If I had already finished them, I would post a link to it.
0
 

Author Comment

by:mikecvba
ID: 40474527
IEObject Web table scraping and tableobjects are great for single queries but if you are repeatedly interrogating the tables with new parameters, it has to be the xmlhttp approach. In my example the above two methods took a good few seconds whereas the XML appears instant. Unless you gurus have a fourth option?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40474652
If you want an MSXML2 solution, please post a new question that includes a link to this question.  Then come back to this question and post a link to your new question.
0

Featured Post

Enroll in May's Course of the Month

May’s Course of the Month is now available! Experts Exchange’s Premium Members and Team Accounts have access to a complimentary course each month as part of their membership—an extra way to increase training and boost professional development.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article is the result of a quest to better understand Task Scheduler 2.0 and all the newer objects available in vbscript in this version over  the limited options we had scripting in Task Scheduler 1.0.  As I started my journey of knowledge I f…
When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question