Solved

Excel VBA WEb Data Scraping from a Table

Posted on 2014-11-30
13
4,336 Views
Last Modified: 2014-12-02
I am trying to get the innertext of a table for my research project. The website is in the code and I am able to correctly enter the 'Transferring Account Number' e.g. 1234 (which I have hard coded in for now).
I can click the submit button fine and the data is populated at the bottom. But I am unable to access the data in the table.

I would like to be able to access individual cells in the table for example:

Transferring Account Identifier
Transferring Account Type
Acquiring Account Type

Capture.JPG
Alternatively, I would like to populate excel with all of the data from the results table - into "Sheet1"
My first hurdle is that I cannot find the table name or ID in the page source.
Help please
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object
  Dim accountno As String
  
  

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")
    

  With IE

  .Visible = True
  .navigate ("http://ec.europa.eu/environment/ets/transaction.do")

  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4
  DoEvents
  Wend

 'accessing the Transferring account number into the correct field
 IE.Document.All.Item("originatingAccountNumber").innertext = "1234"

        
  ' accessing the button
  
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  Do While IE.busy: DoEvents: Loop

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Range("A1:K500").ClearContents

  
 '???this is where I get stuck
  
  
  Set elemCollection = IE.Document.getElementsByTagname("????")
  
  
   
     For t = 0 To (elemCollection.Length - 1)
     
         For r = 0 To (elemCollection(t).Rows.Length - 1)
             For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innertext
             Next c
         Next r
     Next t
  
  End With

  ' cleaning up memory
  Set IE = Nothing
  
  End Sub

Open in new window

0
Comment
Question by:mikecvba
  • 5
  • 4
  • 3
13 Comments
 
LVL 68

Expert Comment

by:Qlemo
ID: 40472628
You should get the table as starting point by ID:
Set elemCollection = IE.Document.getElementByID(("tblTransactionSearchResult")

Open in new window

But I'm not yet clear how to proceed.
0
 
LVL 68

Accepted Solution

by:
Qlemo earned 250 total points
ID: 40472681
This is the corrected and simplified VBA code. Remember to close IE before removing references - this is included now too.
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object, curHTMLRow As Object
  Dim accountno As String

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")

  IE.Visible = True
  IE.navigate ("http://ec.europa.eu/environment/ets/transaction.do")
  
  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4: DoEvents: Wend

  'accessing the Transferring account number into the correct field
  IE.Document.All.Item("originatingAccountNumber").InnerText = "1234"
        
  ' accessing the button
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  While IE.busy: DoEvents: Wend

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Activate
  Range("A1:K500").ClearContents
  
  Set elemCollection = IE.Document.getElementById("tblTransactionSearchResult")
  
  ' Rows(0) is Table title, so skipping
  For r = 1 To elemCollection.Rows.Length - 1
    Set curHTMLRow = elemCollection.Rows(r)
    For c = 0 To curHTMLRow.Cells.Length - 1
      Cells(r + 1, c + 1) = curHTMLRow.Cells(c).InnerText
    Next
  Next

  ' cleaning up memory
  IE.Quit
  Set IE = Nothing
  
End Sub

Open in new window

0
 

Author Comment

by:mikecvba
ID: 40473522
Olemo - its works brilliant and thanks for your excellent code. The only problem is that I have 20,000 (account numbers 1 to 20,000) enquiries to make and its very slow reloading the IE object and populating the innertext with the account number e.g. "1234" each time. Around 8 seconds per load. I think I should have done it with the XMLHTTP object, accessing the xml data directly otherwise it will take a month to populate..

I have written the XML connection code (below) and created the string argument to access the XML result but I don't know how to convert the result to something similar to the IE.document format that you used.

Dim XMLHTTP
    Dim result As String
    Dim argumentString
argumentString = "?export_url=destinationAccountHolder=&startDate=01%2F01%2F2008&destinationRegistry=-1&originatingAccountType=121&form=transaction&endDate=31%2F12%2F2011&transactionID=&originatingAccountHolder=&suppTransactionType=-1&transactionType=-1&languageCode=en&destinationAccountNumber=" & registryno & "&destinationAccountType=121&toCompletionDate=&originatingRegistry=-1&destinationAccountIdentifier=&fromCompletionDate=&originatingAccountIdentifier=&transactionStatus=4&originatingAccountNumber=&currentSortSettings=&form=transaction&exportType=1&OK=OK"
    
    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
    XMLHTTP.Open "POST", _
    "http://ec.europa.eu/environment/ets/export.do", False
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    XMLHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
    XMLHTTP.send argumentString
    result = XMLHTTP.responsetext
    result = Replace(result, "encoding=" & Chr(34) & "UTF-8", "encoding=" & Chr(34) & "iso-8859-1")

Open in new window



Can you please help me replace the header in your suggested code so that the result here flows and populates the rows and columns in the Excel sheet but with the XML object rather than the Internet Explorer populated page?
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 250 total points
ID: 40473548
You could also use a querytable object to retrieve the web page's table.
Example:
Sub Q_28571716()
   Dim oQT As QueryTable
   
   'Create query table to hold the rates.
   With Worksheets("Sheet2")
       Set oQT = .QueryTables.Add( _
           Connection:="URL;http://ec.europa.eu/environment/ets/transaction.do?languageCode=en&startDate=&endDate=&transactionStatus=4&fromCompletionDate=&toCompletionDate=&transactionID=&transactionType=-1&suppTransactionType=-1&originatingRegistry=-1&destinationRegistry=-1&originatingAccountType=-1&destinationAccountType=-1&originatingAccountNumber=1234&destinationAccountNumber=&originatingAccountIdentifier=&destinationAccountIdentifier=&originatingAccountHolder=&destinationAccountHolder=&search=Search&currentSortSettings=", _
           Destination:=.Range("A1"))
   End With
   
   With oQT
     
        'Don't recognize dates.
        .WebDisableDateRecognition = False
        
        'Don't refresh query when file opened.
        .RefreshOnFileOpen = False
        
        'Ignore page formatting.
        .WebFormatting = xlWebFormattingNone
        
        'Wait for query to finish before continuing.
        .BackgroundQuery = False
        
        'Select a specific table.
        .WebSelectionType = xlSpecifiedTables
        
        'Import the table containing the exchange rates.
        .WebTables = "tblTransactionSearchResult"
        
        'Save the query with workbook.
        .SaveData = True
        
        'Adjust columns to fit the data.
        .AdjustColumnWidth = True
        
        .Refresh

    End With

End Sub

Open in new window



Reference:
http://msdn.microsoft.com/en-us/library/office/aa203721(v=office.11).aspx#odc_xlwebquery_retrievingdatafromasingletable
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 250 total points
ID: 40473551
You could tweak this routine, adding an account number parameter, and add that parameter value into the URL/URI string concatenation.
0
 

Author Closing Comment

by:mikecvba
ID: 40473614
Both experts provide 2 perfects answers to the same solution. aikimark  answered with a sleek alternative. Qlemo answered perfectly the original request
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 68

Expert Comment

by:Qlemo
ID: 40473668
Mark,
Interesting. Didn't know you can use the table ID in WebTables, I was always using an integer index.

mikecvba,
Is your follow-up question moot by using the parameter in the URL and/or Web Queries, or are you going to ask a new question about XML?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40473738
Didn't know you can use the table ID
Learn much, you will, my young padawan
0
 

Author Comment

by:mikecvba
ID: 40474401
Olemo, i would still like to do it with the XML object as both of the two methods above -  via the Internet Explorer, populating  the textbox and clicking search button &  via the tableobject are very slow. I have run out of points to award but i would appreciate an education in connecting the xmlhttp object to the online xml table and then populating the table/rows columns on the sheet. I have already started the xmlhttp code as above but i need to format the results into rows and columns -  exactly as you did in your first solution from the  resultant IE.Document.  I think future readers will find this  page informative to see all three working solutions together.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40474440
I am working on a few new articles on that subject.  If I had already finished them, I would post a link to it.
0
 

Author Comment

by:mikecvba
ID: 40474527
IEObject Web table scraping and tableobjects are great for single queries but if you are repeatedly interrogating the tables with new parameters, it has to be the xmlhttp approach. In my example the above two methods took a good few seconds whereas the XML appears instant. Unless you gurus have a fourth option?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40474652
If you want an MSXML2 solution, please post a new question that includes a link to this question.  Then come back to this question and post a link to your new question.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

This is pretty cool.  The purpose of this VB Script is to help you document where JAR (Java ARchive) files and specifically java class files are located so that you can address issues seen with a client or that you can speak intelligently with a dev…
Convert between Excel file formats (.XLS, .XLSX, .XLSM) with/without macro option David Miller (dlmille) Intro Over this past Fall, I've had the opportunity to see several similar requests and have developed a couple related solutions associate…
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now