Using MSXML2 to import Web Table Data into Excel with VBA

I need help to use the MSXML2 object to load xml data into Excel.

Here is the website with data preloaded (the Account number - "1234" is the only variable):;?languageCode=en&startDate=&endDate=&transactionStatus=4&fromCompletionDate=&toCompletionDate=&transactionID=&transactionType=-1&suppTransactionType=-1&originatingRegistry=-1&destinationRegistry=-1&originatingAccountType=-1&destinationAccountType=-1&originatingAccountNumber=1234&destinationAccountNumber=&originatingAccountIdentifier=&destinationAccountIdentifier=&originatingAccountHolder=&destinationAccountHolder=&search=Search¤tSortSettings=&resultList.currentPageNumber=1

I just need to be able to scrape the very top row of the table into the first row of Sheet1 in excel using VBA.

With the help of experts exchange ( see Excel VBA Web Data Scraping from a Table ). I can get the table data into excel using Internet Explorer object or TableObject but its very slow. I need to run the query many time with the range of account number 1-15,000 which will take weeks with the current methods.

How do I get the top row of the table data on the web into Excel row 1 on Sheet1 using the MSXML2 object? I am hoping this will be faster,
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Add this code to a module in your workbook.
Option Explicit

Function GetPage(ByVal parmURL As String) As String
    Static oXMLHTTP As Object
    If oXMLHTTP Is Nothing Then
        Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
    End If
    oXMLHTTP.Open "GET", parmURL, False
    Do Until oXMLHTTP.ReadyState = 4
    If oXMLHTTP.Status = 200 Then
        GetPage = oXMLHTTP.responsetext
        GetPage = vbNullString
    End If
End Function

Function Q_28572779(ByVal parmAccount As String, Optional parmIncludeHeaderRow As Boolean = False) As Variant
    'Returns array of table values
    Dim strHTML As String
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim oSM As Object
    Dim lngSM As Long

    Dim lngRow As Long
    Dim lngPage As Long
    Dim lngPageCount As Long
    Dim vData As Variant
    Dim vHeader As Variant
    Dim colData As New Collection
    Dim vItem As Variant
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True

    strHTML = GetPage("" & parmAccount & "&destinationAccountNumber=&originatingAccountIdentifier=&destinationAccountIdentifier=&originatingAccountHolder=&destinationAccountHolder=&currentSortSettings=&resultList.currentPageNumber=0&nextList=Next>")
    'parse the result
    oRE.Pattern = """resultList.lastPageNumber""[^>]*value=""(\d+)"""
    If oRE.test(strHTML) Then
        Set oMatches = oRE.Execute(strHTML)
        lngPageCount = oMatches(0).submatches(0)
    End If
    If parmIncludeHeaderRow Then
        oRE.Pattern = "<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*<td [^>]*class=""bgtitlelist"">(?:\s|\S)*?>(\w[^<]*)<(?:\s|\S)*?</td>\s*"
        If oRE.test(strHTML) Then
            Set oMatches = oRE.Execute(strHTML)
            colData.Add oMatches(0)
        End If
    End If
    lngRow = 1
    'now for the data
    oRE.Pattern = "<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*<td [^>]*class=""bgcelllist"">(?:\s|\S)*?&nbsp;(.*?)&nbsp;(?:\s|\S)*?</td>\s*"
    For lngPage = 1 To lngPageCount
        If oRE.test(strHTML) Then
            Set oMatches = oRE.Execute(strHTML)
            For Each oM In oMatches
                With oM
                    colData.Add oM
                End With
                lngRow = lngRow + 1
        End If
        strHTML = GetPage("" & parmAccount & "&destinationAccountNumber=&originatingAccountIdentifier=&destinationAccountIdentifier=&originatingAccountHolder=&destinationAccountHolder=&currentSortSettings=&resultList.currentPageNumber=" & lngPage & "&nextList=Next>")


    ReDim vData(1 To colData.Count, 0 To 12)
    lngRow = 1
    For Each vItem In colData
        Set oM = vItem
        For lngSM = 0 To 12
            vData(lngRow, lngSM) = oM.submatches(lngSM)
        lngRow = lngRow + 1

    Q_28572779 = vData
End Function

Open in new window

Invocation Example:

vPageData = Q_28572779("1234")

When you get the function results, you can push the entire array into your worksheet in one statement.  For more information on that, please read my Fast Data Push to Excel article.


You can optionally include or (default) exclude the headers.
This code goes through all the pages of any multi-page output.
The code in the prior related question only got the first page of output.
Compare the output from this code against the output of the code you currently have.
I tested with account 1231, which has 20 pages of results = 191 rows

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mikecvbaAuthor Commented:
Staggeringly brilliant.  Just when i thought i might one day get to grips with the html/xml world of objects, you pull another show out of the bag. I like arrays. They are normally fast.  I will test and report thanks
You should compare the time it takes to Next through all 20 pages of account 1231 in your browser against the posted routine.

Note: I would have answered this sooner, but you didn't post a comment in the prior thread letting the participating experts know about this new question.
Price Your IT Services for Profit

Managed service contracts are great - when they're making you money. Yes, you’re getting paid monthly, but is it actually profitable? Learn to calculate your hourly overhead burden so you can master your IT services pricing strategy.

Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Reading raw HTML and do your own parsing is probably the fastest way. I'm not happy about having a need for such a complex parsing expression. However, with the ResponseText you loose all properties. Only ResponseXML would be really useful, and that requires to get a XML response. So there is nothing I could provide superior to this solution.
mikecvbaAuthor Commented:
It is a great solution and I understand the principle of downloading the full hmtl response text and then sifting it for the relevant content using the VBScript libraries. I do not profess to understand the very complex parsing expression you created but it works. The grammar in that looks like a PhD on its own!! I would be happy to apply the getpage with a simpler string and keep the properties intact but I don't know how to keep the xml request happy.

A further question for you please. When a webpage send an xml with all the parameters, it gets the html response back as we have seen which contains the results embedded in the overall page result. On this website it seems there are two basic tables that can be returned, one basic on the transaction screen and, if you click on the detailed button, one with a lot more details.

Is it possible to query the xml database directly in a manner that only returns specific fields such as 'AccountNo' or 'TranferringAccountNumber' only? Or is the XML server hard coded to only churn out these two set responses?

There are two full page html responses - main screen and detailed screen and each of those has an export button which generates the data in an xml file.

Put another way, I am asking if it is possible to force the remote xml data server to give up discrete fields only? We can clearly see the field names. Can it be done from this end or are the server responses predetermined? I don't know any xml and just wondered whether, like in access or SQL, you can demand what you want in terms of fields via the query, rather than just having something fixed format thrown at you?

Either way, your code is excellent and does the job fast.
mikecvbaAuthor Commented:
As I attempt to learn from you two giants, perhaps you can explain 'oXMLHTTP.Status = 200'  . What is the status of an object and where does the number 200 comes from? Olemo, why couldn't you use response XML which would allow you to jump quickly to the relevant part of the table as it would keep the formatting? Alternatively can you not bring it in as a HMTL string - response text and the convert that into some kind of XML object?
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Again, this is HTML, no XML. XML needs to get provided directly as response by the Web server. MSXML2 is just a means to download HTML without any processing - you would have the same means with a WebClient object.

The export is indeed a XML, but it is a button on a form. No way I can see in the source to invoke that directly via an URL - that would be great. However, we can go to the export page directly (, fill out the hidden form field "exportURL" with something like

Open in new window

which probably only would need the AccountNumber set, btw, and then emulate pressing the "Ok" button. The response is an export.xml file.
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
oXMLHTTP.Status is a property of the MSXML2 object, containing the resulting status of the page - completed, redirected, error aso., coded as an integer. Details about the object:
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
HTML needs to be translated into XML, unless the remote web page directly submits XML. HTML->XML recoding is available with the free HtmlAgilityPack, but the usage is not well documented, and so it is suited only for the adventurous, or those who are used to XML Path syntax with all its details.
mikecvbaAuthor Commented:
Thanks Olemo. I understand the limitations. You two already provided the code to access the export page directly and emulate the Ok button press under the aforementioned earlier article. That works a treat.
Alternatively, this could be parsed with a DOM object, also available in the MSXML2 name space (type library).  As in the prior related question, the table can be isolated by its name attribute.

Lots of discussion in this thread since yesterday.

200 is a good http return code
404 is not found
405 is forbidden

It might be possible to request the XML output and capture the response directly into the MSXML2 object.  I hadn't noticed that this was an option.  If the resulting XML contains all the data and not just the first 10 items, then it might be a faster solution.
mikecvbaAuthor Commented:
So we could bring back the page as a MSXLM2 DOM object and jump to the table (by name) or even field in the table and that would avoid the need for such complex VBScript patterns? Can we give that a try? I am sure these two pages of solutions will serve many people, who, like me needed a better understanding of the IE and MSXML2 objects. It seems to be condensed nicely here.
I just checked and the XML export does include all the data, not just the first page.  Unfortunately, the XML doesn't directly include information about the account number, so the code would either need to rename the recently exported XML file with a name that includes the account number or process each XML file, including file deletion, before making the next call -- if the XML couldn't be captured directly by the code.

It would have been nice if one of the URL parameters allowed more than 10 rows per page.
You assign the returned string (or byte stream) to a DOM object.
Can we give that a try?
You are welcome to post another question.  Please be sure to come back to this thread and post a link to your new question.  Otherwise, the participating experts do no know you have posted a new question.

If you are familiar with DOM processing, you should first attempt to do this yourself before posting a question.  At this point, you have a working solution.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.