Excel VBA WEb Data Scraping from a Table

I am trying to get the innertext of a table for my research project. The website is in the code and I am able to correctly enter the 'Transferring Account Number' e.g. 1234 (which I have hard coded in for now).
I can click the submit button fine and the data is populated at the bottom. But I am unable to access the data in the table.

I would like to be able to access individual cells in the table for example:

Transferring Account Identifier
Transferring Account Type
Acquiring Account Type

Capture.JPG
Alternatively, I would like to populate excel with all of the data from the results table - into "Sheet1"
My first hurdle is that I cannot find the table name or ID in the page source.
Help please
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object
  Dim accountno As String
  
  

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")
    

  With IE

  .Visible = True
  .navigate ("http://ec.europa.eu/environment/ets/transaction.do")

  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4
  DoEvents
  Wend

 'accessing the Transferring account number into the correct field
 IE.Document.All.Item("originatingAccountNumber").innertext = "1234"

        
  ' accessing the button
  
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  Do While IE.busy: DoEvents: Loop

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Range("A1:K500").ClearContents

  
 '???this is where I get stuck
  
  
  Set elemCollection = IE.Document.getElementsByTagname("????")
  
  
   
     For t = 0 To (elemCollection.Length - 1)
     
         For r = 0 To (elemCollection(t).Rows.Length - 1)
             For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innertext
             Next c
         Next r
     Next t
  
  End With

  ' cleaning up memory
  Set IE = Nothing
  
  End Sub

Open in new window

mikecvbaAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

QlemoBatchelor, Developer and EE Topic AdvisorCommented:
You should get the table as starting point by ID:
Set elemCollection = IE.Document.getElementByID(("tblTransactionSearchResult")

Open in new window

But I'm not yet clear how to proceed.
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
This is the corrected and simplified VBA code. Remember to close IE before removing references - this is included now too.
Sub Macro1()
'
' Macro1 Macro
'

Dim IE As Object, obj As Object
  Dim myState As String
  Dim r As Integer, c As Integer, t As Integer
  Dim elemCollection As Object, curHTMLRow As Object
  Dim accountno As String

  'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
  Set IE = CreateObject("InternetExplorer.Application")

  'more variables for the inputboxes - makes our automation program user friendly

  'accountno = InputBox("Enter the Transferring Account No")

  IE.Visible = True
  IE.navigate ("http://ec.europa.eu/environment/ets/transaction.do")
  
  ' we ensure that the web page downloads completely before we fill the form automatically
  While IE.ReadyState <> 4: DoEvents: Wend

  'accessing the Transferring account number into the correct field
  IE.Document.All.Item("originatingAccountNumber").InnerText = "1234"
        
  ' accessing the button
  IE.Document.getElementsByName("search").Item.Click

  ' again ensuring that the web page loads completely before we start scraping data
  While IE.busy: DoEvents: Wend

 'Clearing any unnecessary or old data in Sheet1
  ThisWorkbook.Sheets("Sheet1").Activate
  Range("A1:K500").ClearContents
  
  Set elemCollection = IE.Document.getElementById("tblTransactionSearchResult")
  
  ' Rows(0) is Table title, so skipping
  For r = 1 To elemCollection.Rows.Length - 1
    Set curHTMLRow = elemCollection.Rows(r)
    For c = 0 To curHTMLRow.Cells.Length - 1
      Cells(r + 1, c + 1) = curHTMLRow.Cells(c).InnerText
    Next
  Next

  ' cleaning up memory
  IE.Quit
  Set IE = Nothing
  
End Sub

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mikecvbaAuthor Commented:
Olemo - its works brilliant and thanks for your excellent code. The only problem is that I have 20,000 (account numbers 1 to 20,000) enquiries to make and its very slow reloading the IE object and populating the innertext with the account number e.g. "1234" each time. Around 8 seconds per load. I think I should have done it with the XMLHTTP object, accessing the xml data directly otherwise it will take a month to populate..

I have written the XML connection code (below) and created the string argument to access the XML result but I don't know how to convert the result to something similar to the IE.document format that you used.

Dim XMLHTTP
    Dim result As String
    Dim argumentString
argumentString = "?export_url=destinationAccountHolder=&startDate=01%2F01%2F2008&destinationRegistry=-1&originatingAccountType=121&form=transaction&endDate=31%2F12%2F2011&transactionID=&originatingAccountHolder=&suppTransactionType=-1&transactionType=-1&languageCode=en&destinationAccountNumber=" & registryno & "&destinationAccountType=121&toCompletionDate=&originatingRegistry=-1&destinationAccountIdentifier=&fromCompletionDate=&originatingAccountIdentifier=&transactionStatus=4&originatingAccountNumber=&currentSortSettings=&form=transaction&exportType=1&OK=OK"
    
    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
    XMLHTTP.Open "POST", _
    "http://ec.europa.eu/environment/ets/export.do", False
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    XMLHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
    XMLHTTP.send argumentString
    result = XMLHTTP.responsetext
    result = Replace(result, "encoding=" & Chr(34) & "UTF-8", "encoding=" & Chr(34) & "iso-8859-1")

Open in new window



Can you please help me replace the header in your suggested code so that the result here flows and populates the rows and columns in the Excel sheet but with the XML object rather than the Internet Explorer populated page?
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

aikimarkCommented:
You could also use a querytable object to retrieve the web page's table.
Example:
Sub Q_28571716()
   Dim oQT As QueryTable
   
   'Create query table to hold the rates.
   With Worksheets("Sheet2")
       Set oQT = .QueryTables.Add( _
           Connection:="URL;http://ec.europa.eu/environment/ets/transaction.do?languageCode=en&startDate=&endDate=&transactionStatus=4&fromCompletionDate=&toCompletionDate=&transactionID=&transactionType=-1&suppTransactionType=-1&originatingRegistry=-1&destinationRegistry=-1&originatingAccountType=-1&destinationAccountType=-1&originatingAccountNumber=1234&destinationAccountNumber=&originatingAccountIdentifier=&destinationAccountIdentifier=&originatingAccountHolder=&destinationAccountHolder=&search=Search&currentSortSettings=", _
           Destination:=.Range("A1"))
   End With
   
   With oQT
     
        'Don't recognize dates.
        .WebDisableDateRecognition = False
        
        'Don't refresh query when file opened.
        .RefreshOnFileOpen = False
        
        'Ignore page formatting.
        .WebFormatting = xlWebFormattingNone
        
        'Wait for query to finish before continuing.
        .BackgroundQuery = False
        
        'Select a specific table.
        .WebSelectionType = xlSpecifiedTables
        
        'Import the table containing the exchange rates.
        .WebTables = "tblTransactionSearchResult"
        
        'Save the query with workbook.
        .SaveData = True
        
        'Adjust columns to fit the data.
        .AdjustColumnWidth = True
        
        .Refresh

    End With

End Sub

Open in new window



Reference:
http://msdn.microsoft.com/en-us/library/office/aa203721(v=office.11).aspx#odc_xlwebquery_retrievingdatafromasingletable
0
aikimarkCommented:
You could tweak this routine, adding an account number parameter, and add that parameter value into the URL/URI string concatenation.
0
mikecvbaAuthor Commented:
Both experts provide 2 perfects answers to the same solution. aikimark  answered with a sleek alternative. Qlemo answered perfectly the original request
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Mark,
Interesting. Didn't know you can use the table ID in WebTables, I was always using an integer index.

mikecvba,
Is your follow-up question moot by using the parameter in the URL and/or Web Queries, or are you going to ask a new question about XML?
0
aikimarkCommented:
Didn't know you can use the table ID
Learn much, you will, my young padawan
0
mikecvbaAuthor Commented:
Olemo, i would still like to do it with the XML object as both of the two methods above -  via the Internet Explorer, populating  the textbox and clicking search button &  via the tableobject are very slow. I have run out of points to award but i would appreciate an education in connecting the xmlhttp object to the online xml table and then populating the table/rows columns on the sheet. I have already started the xmlhttp code as above but i need to format the results into rows and columns -  exactly as you did in your first solution from the  resultant IE.Document.  I think future readers will find this  page informative to see all three working solutions together.
0
aikimarkCommented:
I am working on a few new articles on that subject.  If I had already finished them, I would post a link to it.
0
mikecvbaAuthor Commented:
IEObject Web table scraping and tableobjects are great for single queries but if you are repeatedly interrogating the tables with new parameters, it has to be the xmlhttp approach. In my example the above two methods took a good few seconds whereas the XML appears instant. Unless you gurus have a fourth option?
0
aikimarkCommented:
If you want an MSXML2 solution, please post a new question that includes a link to this question.  Then come back to this question and post a link to your new question.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VB Script

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.