Link to home
Avatar of mikecvba
mikecvba

asked on

Excel VBA WEb Data Scraping from a Table

I am trying to get the innertext of a table for my research project. The website is in the code and I am able to correctly enter the 'Transferring Account Number' e.g. 1234 (which I have hard coded in for now).
I can click the submit button fine and the data is populated at the bottom. But I am unable to access the data in the table.

I would like to be able to access individual cells in the table for example:

Transferring Account Identifier
Transferring Account Type
Acquiring Account Type

User generated image
Alternatively, I would like to populate excel with all of the data from the results table - into "Sheet1"
My first hurdle is that I cannot find the table name or ID in the page source.
Help please
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object
  Dim accountno As String
  
  

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")
    

  With IE

  .Visible = True
  .navigate ("http://ec.europa.eu/environment/ets/transaction.do")

  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4
  DoEvents
  Wend

 'accessing the Transferring account number into the correct field
 IE.Document.All.Item("originatingAccountNumber").innertext = "1234"

        
  ' accessing the button
  
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  Do While IE.busy: DoEvents: Loop

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Range("A1:K500").ClearContents

  
 '???this is where I get stuck
  
  
  Set elemCollection = IE.Document.getElementsByTagname("????")
  
  
   
     For t = 0 To (elemCollection.Length - 1)
     
         For r = 0 To (elemCollection(t).Rows.Length - 1)
             For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innertext
             Next c
         Next r
     Next t
  
  End With

  ' cleaning up memory
  Set IE = Nothing
  
  End Sub

Open in new window

Avatar of Qlemo
Qlemo
Flag of Germany image

You should get the table as starting point by ID:
Set elemCollection = IE.Document.getElementByID(("tblTransactionSearchResult")

Open in new window

But I'm not yet clear how to proceed.
ASKER CERTIFIED SOLUTION
Avatar of Qlemo
Qlemo
Flag of Germany image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of mikecvba
mikecvba

ASKER

Olemo - its works brilliant and thanks for your excellent code. The only problem is that I have 20,000 (account numbers 1 to 20,000) enquiries to make and its very slow reloading the IE object and populating the innertext with the account number e.g. "1234" each time. Around 8 seconds per load. I think I should have done it with the XMLHTTP object, accessing the xml data directly otherwise it will take a month to populate..

I have written the XML connection code (below) and created the string argument to access the XML result but I don't know how to convert the result to something similar to the IE.document format that you used.

Dim XMLHTTP
    Dim result As String
    Dim argumentString
argumentString = "?export_url=destinationAccountHolder=&startDate=01%2F01%2F2008&destinationRegistry=-1&originatingAccountType=121&form=transaction&endDate=31%2F12%2F2011&transactionID=&originatingAccountHolder=&suppTransactionType=-1&transactionType=-1&languageCode=en&destinationAccountNumber=" & registryno & "&destinationAccountType=121&toCompletionDate=&originatingRegistry=-1&destinationAccountIdentifier=&fromCompletionDate=&originatingAccountIdentifier=&transactionStatus=4&originatingAccountNumber=&currentSortSettings=&form=transaction&exportType=1&OK=OK"
    
    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
    XMLHTTP.Open "POST", _
    "http://ec.europa.eu/environment/ets/export.do", False
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    XMLHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
    XMLHTTP.send argumentString
    result = XMLHTTP.responsetext
    result = Replace(result, "encoding=" & Chr(34) & "UTF-8", "encoding=" & Chr(34) & "iso-8859-1")

Open in new window



Can you please help me replace the header in your suggested code so that the result here flows and populates the rows and columns in the Excel sheet but with the XML object rather than the Internet Explorer populated page?
SOLUTION
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
SOLUTION
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Both experts provide 2 perfects answers to the same solution. aikimark  answered with a sleek alternative. Qlemo answered perfectly the original request
Mark,
Interesting. Didn't know you can use the table ID in WebTables, I was always using an integer index.

mikecvba,
Is your follow-up question moot by using the parameter in the URL and/or Web Queries, or are you going to ask a new question about XML?
Didn't know you can use the table ID
Learn much, you will, my young padawan
Olemo, i would still like to do it with the XML object as both of the two methods above -  via the Internet Explorer, populating  the textbox and clicking search button &  via the tableobject are very slow. I have run out of points to award but i would appreciate an education in connecting the xmlhttp object to the online xml table and then populating the table/rows columns on the sheet. I have already started the xmlhttp code as above but i need to format the results into rows and columns -  exactly as you did in your first solution from the  resultant IE.Document.  I think future readers will find this  page informative to see all three working solutions together.
I am working on a few new articles on that subject.  If I had already finished them, I would post a link to it.
IEObject Web table scraping and tableobjects are great for single queries but if you are repeatedly interrogating the tables with new parameters, it has to be the xmlhttp approach. In my example the above two methods took a good few seconds whereas the XML appears instant. Unless you gurus have a fourth option?
If you want an MSXML2 solution, please post a new question that includes a link to this question.  Then come back to this question and post a link to your new question.