Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 6697
  • Last Modified:

Excel VBA WEb Data Scraping from a Table

I am trying to get the innertext of a table for my research project. The website is in the code and I am able to correctly enter the 'Transferring Account Number' e.g. 1234 (which I have hard coded in for now).
I can click the submit button fine and the data is populated at the bottom. But I am unable to access the data in the table.

I would like to be able to access individual cells in the table for example:

Transferring Account Identifier
Transferring Account Type
Acquiring Account Type

Capture.JPG
Alternatively, I would like to populate excel with all of the data from the results table - into "Sheet1"
My first hurdle is that I cannot find the table name or ID in the page source.
Help please
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object
  Dim accountno As String
  
  

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")
    

  With IE

  .Visible = True
  .navigate ("http://ec.europa.eu/environment/ets/transaction.do")

  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4
  DoEvents
  Wend

 'accessing the Transferring account number into the correct field
 IE.Document.All.Item("originatingAccountNumber").innertext = "1234"

        
  ' accessing the button
  
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  Do While IE.busy: DoEvents: Loop

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Range("A1:K500").ClearContents

  
 '???this is where I get stuck
  
  
  Set elemCollection = IE.Document.getElementsByTagname("????")
  
  
   
     For t = 0 To (elemCollection.Length - 1)
     
         For r = 0 To (elemCollection(t).Rows.Length - 1)
             For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innertext
             Next c
         Next r
     Next t
  
  End With

  ' cleaning up memory
  Set IE = Nothing
  
  End Sub

Open in new window

0
mikecvba
Asked:
mikecvba
  • 5
  • 4
  • 3
3 Solutions
 
QlemoC++ DeveloperCommented:
You should get the table as starting point by ID:
Set elemCollection = IE.Document.getElementByID(("tblTransactionSearchResult")

Open in new window

But I'm not yet clear how to proceed.
0
 
QlemoC++ DeveloperCommented:
This is the corrected and simplified VBA code. Remember to close IE before removing references - this is included now too.
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object, curHTMLRow As Object
  Dim accountno As String

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")

  IE.Visible = True
  IE.navigate ("http://ec.europa.eu/environment/ets/transaction.do")
  
  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4: DoEvents: Wend

  'accessing the Transferring account number into the correct field
  IE.Document.All.Item("originatingAccountNumber").InnerText = "1234"
        
  ' accessing the button
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  While IE.busy: DoEvents: Wend

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Activate
  Range("A1:K500").ClearContents
  
  Set elemCollection = IE.Document.getElementById("tblTransactionSearchResult")
  
  ' Rows(0) is Table title, so skipping
  For r = 1 To elemCollection.Rows.Length - 1
    Set curHTMLRow = elemCollection.Rows(r)
    For c = 0 To curHTMLRow.Cells.Length - 1
      Cells(r + 1, c + 1) = curHTMLRow.Cells(c).InnerText
    Next
  Next

  ' cleaning up memory
  IE.Quit
  Set IE = Nothing
  
End Sub

Open in new window

0
 
mikecvbaAuthor Commented:
Olemo - its works brilliant and thanks for your excellent code. The only problem is that I have 20,000 (account numbers 1 to 20,000) enquiries to make and its very slow reloading the IE object and populating the innertext with the account number e.g. "1234" each time. Around 8 seconds per load. I think I should have done it with the XMLHTTP object, accessing the xml data directly otherwise it will take a month to populate..

I have written the XML connection code (below) and created the string argument to access the XML result but I don't know how to convert the result to something similar to the IE.document format that you used.

Dim XMLHTTP
    Dim result As String
    Dim argumentString
argumentString = "?export_url=destinationAccountHolder=&startDate=01%2F01%2F2008&destinationRegistry=-1&originatingAccountType=121&form=transaction&endDate=31%2F12%2F2011&transactionID=&originatingAccountHolder=&suppTransactionType=-1&transactionType=-1&languageCode=en&destinationAccountNumber=" & registryno & "&destinationAccountType=121&toCompletionDate=&originatingRegistry=-1&destinationAccountIdentifier=&fromCompletionDate=&originatingAccountIdentifier=&transactionStatus=4&originatingAccountNumber=&currentSortSettings=&form=transaction&exportType=1&OK=OK"
    
    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
    XMLHTTP.Open "POST", _
    "http://ec.europa.eu/environment/ets/export.do", False
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    XMLHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
    XMLHTTP.send argumentString
    result = XMLHTTP.responsetext
    result = Replace(result, "encoding=" & Chr(34) & "UTF-8", "encoding=" & Chr(34) & "iso-8859-1")

Open in new window



Can you please help me replace the header in your suggested code so that the result here flows and populates the rows and columns in the Excel sheet but with the XML object rather than the Internet Explorer populated page?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
aikimarkCommented:
You could also use a querytable object to retrieve the web page's table.
Example:
Sub Q_28571716()
   Dim oQT As QueryTable
   
   'Create query table to hold the rates.
   With Worksheets("Sheet2")
       Set oQT = .QueryTables.Add( _
           Connection:="URL;http://ec.europa.eu/environment/ets/transaction.do?languageCode=en&startDate=&endDate=&transactionStatus=4&fromCompletionDate=&toCompletionDate=&transactionID=&transactionType=-1&suppTransactionType=-1&originatingRegistry=-1&destinationRegistry=-1&originatingAccountType=-1&destinationAccountType=-1&originatingAccountNumber=1234&destinationAccountNumber=&originatingAccountIdentifier=&destinationAccountIdentifier=&originatingAccountHolder=&destinationAccountHolder=&search=Search&currentSortSettings=", _
           Destination:=.Range("A1"))
   End With
   
   With oQT
     
        'Don't recognize dates.
        .WebDisableDateRecognition = False
        
        'Don't refresh query when file opened.
        .RefreshOnFileOpen = False
        
        'Ignore page formatting.
        .WebFormatting = xlWebFormattingNone
        
        'Wait for query to finish before continuing.
        .BackgroundQuery = False
        
        'Select a specific table.
        .WebSelectionType = xlSpecifiedTables
        
        'Import the table containing the exchange rates.
        .WebTables = "tblTransactionSearchResult"
        
        'Save the query with workbook.
        .SaveData = True
        
        'Adjust columns to fit the data.
        .AdjustColumnWidth = True
        
        .Refresh

    End With

End Sub

Open in new window



Reference:
http://msdn.microsoft.com/en-us/library/office/aa203721(v=office.11).aspx#odc_xlwebquery_retrievingdatafromasingletable
0
 
aikimarkCommented:
You could tweak this routine, adding an account number parameter, and add that parameter value into the URL/URI string concatenation.
0
 
mikecvbaAuthor Commented:
Both experts provide 2 perfects answers to the same solution. aikimark  answered with a sleek alternative. Qlemo answered perfectly the original request
0
 
QlemoC++ DeveloperCommented:
Mark,
Interesting. Didn't know you can use the table ID in WebTables, I was always using an integer index.

mikecvba,
Is your follow-up question moot by using the parameter in the URL and/or Web Queries, or are you going to ask a new question about XML?
0
 
aikimarkCommented:
Didn't know you can use the table ID
Learn much, you will, my young padawan
0
 
mikecvbaAuthor Commented:
Olemo, i would still like to do it with the XML object as both of the two methods above -  via the Internet Explorer, populating  the textbox and clicking search button &  via the tableobject are very slow. I have run out of points to award but i would appreciate an education in connecting the xmlhttp object to the online xml table and then populating the table/rows columns on the sheet. I have already started the xmlhttp code as above but i need to format the results into rows and columns -  exactly as you did in your first solution from the  resultant IE.Document.  I think future readers will find this  page informative to see all three working solutions together.
0
 
aikimarkCommented:
I am working on a few new articles on that subject.  If I had already finished them, I would post a link to it.
0
 
mikecvbaAuthor Commented:
IEObject Web table scraping and tableobjects are great for single queries but if you are repeatedly interrogating the tables with new parameters, it has to be the xmlhttp approach. In my example the above two methods took a good few seconds whereas the XML appears instant. Unless you gurus have a fourth option?
0
 
aikimarkCommented:
If you want an MSXML2 solution, please post a new question that includes a link to this question.  Then come back to this question and post a link to your new question.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now