?
Solved

pulling data from website into excel

Posted on 2010-04-09
29
Medium Priority
?
605 Views
Last Modified: 2013-11-26
can someone point me in the right direction to how I can pull data from a website using vba code? All I know right now is the website seems to be in HTML format and I know where the data is in the HTML code but do not know how to extract it. I've also never extracted data from a website so I have no idea where to begin (but I do have moderate knowledge in vba).
0
Comment
Question by:k1ng87
  • 13
  • 9
  • 5
  • +1
28 Comments
 
LVL 50

Assisted Solution

by:Ingeborg Hawighorst (Microsoft MVP / EE MVE)
Ingeborg Hawighorst (Microsoft MVP / EE MVE) earned 332 total points
ID: 30247401
Hi K1ng87,

do you want the data to be a dynamic query? If so, you can use code along these lines as a starting point.

cheers,  teylyn
    With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;http://www.example.com/folder/page.htm", Destination:=Range("$A$1"))
        .Name = "index"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
        .WebTables = "8" ' this is a table in the HTML code
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With

Open in new window

0
 
LVL 1

Author Comment

by:k1ng87
ID: 30249533
what if its not a table? Like what if I want to pull some heading (<h></h>) title a certain spot in the HTML code??
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 30254838
Then you would probably be best off automating interent explorer and use regular expressions to parse the html. That probably sounds like gibbersih - see http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_25759667.html for a recent example
If you post your weblink then I will tailor this to your need
Cheers
Dave
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
LVL 1

Author Comment

by:k1ng87
ID: 30256239
here is what I am trying to do...I work for a small company and we are trying to track on-time delivery from UPS....I would like to write a vba code in excel that goes through a list or tracking numbers, inputs the tracking number into this link...http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=trackingnumbergoeshere.....and pull the the delivered on date and status...i'm not sure how complex this is or even if it can be done....this is just a way I thought  maybe it can be done...

thanks for looking Dave!
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 30260695
this example will loop through the codes in column A of sheet 1 and dump the tracking data to column A of sheet 2.

did you want to parse out the date and status or is this ok as is?

Cheers
Dave

Sub GetWeb()
    Dim objIe
    Dim objRegEx
    Dim RegMC
    Dim RegM
    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim rng1 As Range
    Dim c As Range
    Dim strTemp As String

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With

    Set ws1 = ActiveWorkbook.Sheets(1)
    Set ws2 = ActiveWorkbook.Sheets(2)
    Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
    Set objIe = CreateObject("internetexplorer.application")
    Set objRegEx = CreateObject("vbscript.regexp")

    objRegEx.Global = True

    ws2.UsedRange.ClearContents
    For Each c In rng1
        objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
        Do While objIe.ReadyState <> 4
            DoEvents
        Loop
        objRegEx.Pattern = "[\n\r\t]"
        strTemp = objIe.Document.forms(4).innertext
        strTemp = objRegEx.Replace(strTemp, vbNullString)
        ws2.[a1] = strTemp
    Next c
    objIe.Quit
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
End Sub

Open in new window

EE---Dynamics-Search-3.xls
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 30261938
this version parses out the delivered and date info to the second sheet

Dave

Sub GetWeb()
    Dim objIe
    Dim objRegEx
    Dim RegMC
    Dim RegM
    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim rng1 As Range
    Dim c As Range
    Dim strTemp As String

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With

    Set ws1 = ActiveWorkbook.Sheets(1)
    Set ws2 = ActiveWorkbook.Sheets(2)
    Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
    Set objIe = CreateObject("internetexplorer.application")
    Set objRegEx = CreateObject("vbscript.regexp")

    objRegEx.Global = True

    ws2.UsedRange.ClearContents
    For Each c In rng1
        objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
        Do While objIe.ReadyState <> 4
            DoEvents
        Loop
        objRegEx.Pattern = "[\n\r\t]"
        strTemp = objIe.Document.forms(4).innertext
        strTemp = objRegEx.Replace(strTemp, vbNullString)
        objRegEx.Pattern = "Status: ([A-Za-z]+)(.+?Delivered On:(.+?) Signed)?"
        Set RegMC = objRegEx.Execute(strTemp)
        For Each RegM In RegMC
            ws2.Cells(Rows.Count, "A").End(xlUp).Offset(1, 0) = c.Value
            ws2.Cells(Rows.Count, "B").End(xlUp).Offset(1, 0) = RegM.submatches(0)
            ws2.Cells(Rows.Count, "C").End(xlUp).Offset(1, 0) = RegM.submatches(2)
        Next
    Next c
    objIe.Quit
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
End Sub

Open in new window

EE---Dynamics-Search-1.xls
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30339335
If you want to avoid the overhead and problems with Internet Explorer you could read the web page through the WinHTTPRequest object. Here is some code which shows how to read a W3C's web page and extract the publication date and time of its top article:


Sub GetWebPage()
    Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
    WinHTTP.Open "GET", "http://www.w3.org", False
    WinHTTP.Send
    allHTML = WinHTTP.ResponseText
    posPublishedDate = InStr(1, allHTML, "dtstart published")
    publishedDate = Mid(allHTML, posPublishedDate + 26, 25)
    MsgBox "W3C's home page had its main front page article updated on " & Chr(10) & Chr(10) & Chr(9) & publishedDate, , "Retrieved Directly from the web:"
    'Application.ActiveSheet.Range("A3").Value = publishedDate
    WinHTTP = Null
End Sub

Open in new window

0
 
LVL 1

Author Comment

by:k1ng87
ID: 30419547
I think i'm understanding the code now but can't seem to figure out how its pulling the information from IE....can someone please explain this section of the code?

thanks!
0
 
LVL 50

Accepted Solution

by:
Dave Brett earned 334 total points
ID: 30458165
If you debug the code then you can see that the IE instance has a number ofr Forms under
objIe.Document
The 5th item contains the delivery data that you want (it is 4 in the code as the items start from 0)

A regular expression is used to parse the string to extract the text immediately after "Status: " and "Delievered On:"
objRegEx.Pattern = "Status: ([A-Za-z]+)(.+?Delivered On:(.+?) Signed)?"

See Patrick Matthews excellent article on regular expressions at   http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html - it is the method to use when parsing text

Cheers

Dave

0
 
LVL 6

Assisted Solution

by:yogi4life
yogi4life earned 334 total points
ID: 30503923
k1ng87,

Contrary to what you thought - the suggestion of teylyn (the first comment to your question) was actually a feasable solution.

a) The QueryTables supports retrieving web pages without tables - although due to the name it is a bit counter-intuitive.
     (Just leave out the .WebTables parameter and change the .WebSelectionType parameter xlSpecifiedTables into xlEntirePage)
b) The UPS web site with tracking information actually wraps the information into a table.
    (So you could even use the standard setting for xlQueryTables...)

If you prefer - you could actually retrieve the information without the reliance on anything outside of Excel - if that is a priority for your needs or your solution.

The following VBA macro requires that you have a UPS tracking number in the current cell of the active worksheet when you start the macro. You could take this as a starting point and use rather looping through a number of tracking numbers just as the code from Dave suggest.


Sub RetrieveWebPageOrTable()
    Dim ws As Worksheet
    Dim rs As Worksheet
    Set ws = ActiveSheet
    tn = Application.ActiveCell.Value
    Sheets.Add After:=Sheets(1), Type:=xlWorksheet
    Sheets(2).Name = tn
    Sheets(tn).Select
    Set rs = Sheets(tn)
    conn = "URL;http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & tn
    rs.Range("A1").Formula = "=INDEX(B:B,MATCH(""Status:"",B:B,0)+1)"
    rs.Range("A2").Formula = "=INDEX(B:B,MATCH(""Delivered On:"",B:B,0)+1)"
    rs.Range("B1").Value = "UPS Web Page info:"
    rs.Range("B1").Font.Bold = True
    With rs.QueryTables.Add(conn, Range("B2"))
        .Name = tn
        .AdjustColumnWidth = True
        .WebDisableDateRecognition = False
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .Refresh BackgroundQuery:=False
    End With
    rs.Range("A:A").ColumnWidth = "30"
End Sub

Open in new window

0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30504653
To sum up so far:

Dave's excellent solution for looping through the UPS tracking numbers could be used with any of the three web page retrieve methods:

1. QueryTables (native Excel method) - as teylyn shows
2. Using Internet Explorer via automation (COM object) - as Dave suggest
3. Using the Windows operating system's COM object WinHTTPRequest (standard on all Windows installations) - as I have shown

All methods has their pros and cons. So it is entirely up to which requirements needs to be satisfied - which approach to select...

For instance has the Internet Explorer solution the benefit of being a full-rendering of the web page which could be good - but it has the disadvantage of requiring more resources. All external stylesheets and all graphics are loaded so you end up retrieving several times the amount of data that you actually need. But it has the advantage of being transferable to other enviroments which doesnt run in Excel or VBA - like other programming languages and other scripting solutions which support COM automation.

The first solution has benefits of low overhead no page rendering as standard (although it can also be done). It is a more understandable for those who havent the COM automation knowledge.

The last solution has also very low overhead and you have very high control of what the http request should or should not do. Furthermore it is usable outside of Excel and VBA enviroments too.
0
 
LVL 1

Author Comment

by:k1ng87
ID: 30509303
well...maybe you guys could help me select the appropriate one??
here is where I wanted to go with this....I wanted to pull tracking numbers on a weekly basis from sql into excel, run that macro, and then transfer the on-time metrics back up to the sql db....
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30511308
Well - all the above solutions will work - actually - but I think if I were you stick to the solution you understand best. If all seems to be hard to understand select the one you most likely will need to learn anyhow for other purposes.

If you dont have any preferences at all - then I would stick to using QueryTables method... That method is more in line with what any Excel VBA programmer should be able to support anyhow ... whereas COM object automation - is for the more advanced VBA programmers...

0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30511403
But - If you need code that you can cut & paste most quickly right now - I think Dave has made a very good effort to give you a complete turn-key solution.
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30511510
Hmm, why would you want to use Excel anyway then? If you pull from SQL and put it back into SQL... Do you have any reporting needs with Excel that must be covered?
0
 
LVL 1

Author Comment

by:k1ng87
ID: 30516096
umm.....not really....I'm assuming right now that I can't run something in SQL (sql server '00) to do wat the vba code is doing??
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30517822
It is a number of ways to do this without Excel even with SQL Server 2000. You could make a SQL Server DTS (Data Transformation Services - which is included with SQL Server 2000) task which can use a VBScript which can utlized the WinHTTPRequest COM object to retrieve the HTML and UPS information and then post the results immediately back to a SQL Server table...

You will have the added benefit of being able to schedule it in DTS as a job - so it can run weekly there with a lot of options for error reporting and tracking if anything goes wrong...

0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30518928
If going for SQL Server 2000 DTS is a bit much for your current requirement you could simply use the free Windows version of CURL which is a command-line tool to retrieve among other things HTML pages. And you could simply use AWK (another superb free command-line tool) to extract your information. And then you could schedule this once a week with normal Windows scheduling...

I could provide you an example - but first I have to do some other business... I will post you a sample later this evening (GMT+1) ...

- Petter
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30519243
Just a short example of CURL retrieving a web-page into a local file... I will post a comprehensive code later...

C:\>curl "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=1ZV4W8030317198872" >html-text.txt
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 30567711
Unsubscribing as I think that the orginal question, plus follow-ups have been covered off.

Thanks Petter for your kind comments above re my code.

Regards

Dave
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 30605261
Using CURL is a bit more comprehensive actually - since UPS wants you to register and use their API.

Which is quite sensible - because then you can get the data in a predefined format like CSV or XML and you will NOT need to parse possibly changing and unreliable HTML code. You will get a much more reliable, low-maintenace and robust solution.

Registering is for free and you get a userid and a password. This can be used in a POST (via CURL - commandline or WinHTTPRequest via VBScript) with an upload file full of tracking-numbers and you will get one file back with the status for all tracking-numbers. All without any HTML...

I would definitely recommend you to go in that direction - but if you are in a hurry - just stick to option 1.
I could help you with the VBScript / WinHTTPRequest and scheduling - but then you need to credit this first and post a new question - since this has nothing to do with Excel...

Regards
Petter
0
 
LVL 1

Author Comment

by:k1ng87
ID: 32356054
wow...i've been away from this thread for awhile now....i got pulled into some other projects and this kinda got left in the back burner....

brettdj....your solution is very helpful....but I think this is going to get more complicated as I was originally thinking of going from sql to excel(runs the code) then back to sql...is there away this can be done in sql?? or would that just get waayyy tooo complex??
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 32390427
Could you tell the specifics of what SQL you are using? What version and which vendor?

Is i Microsoft SQL Server 2005? 2008? 2000? Do you have SSIS running on the SQL Server?

It is pure guess-work to give an answer if you dont tell us what your run-time environment offers and what privileges you have to configure and reconfigure. For instance you could enable xp_cmdshell in SQL Server and configure a proxy account to use it then you could solve your problem with a few lines of code more or less directly from SQL.





0
 
LVL 6

Expert Comment

by:yogi4life
ID: 32402542
You could actually use a VBScript to download the page from the web and then connect to SQL Server and insert the information into the database just like the attached code.

You will need at table in the database called DELIVERY_STATUS with three columns which are all VARCHAR(50)... or you will have to modify the code accordingly. The connection to the SQL Server is done via the logged on user in Windows (trusted connection) and the script as it is require it to run on the SQL Server machine... But that could be changed easly.

Save the attached code in a file like RetrieveDelivery.vbs and schedule it to run on a regular interval. It could be extended to read a list of tracking numbers in a database and update the delivery status regularly of course.
TrackNum = "1ZV4W8030317198872"
Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
URL = "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & TrackNum
WinHTTP.Open "GET", URL, False
WinHTTP.Send
allHTML = WinHTTP.ResponseText
WinHTTP = Null
posDeliveredLabel = InStr(1, allHTML, "<dt><label>Delivered On:</label></dt>")
If posDeliveredLabel > 0 Then
    posDeliveredDate = Instr(posDeliveredLabel, allHTML , "<dd>")
    posDeliveredDateEnd = Instr(posDeliveredDate, allHTML , "</dd>")
    deliveredDate = Split(Mid(allHTML, posDeliveredDate + 5, posDeliveredDateEnd-posDeliveredDate-5),chr(10))
    deliveredDate(0) = Trim(deliveredDate(0))
    deliveredDate(1) = Trim(deliveredDate(1))
    'MsgBox deliveredDate(0) & " " & deliveredDate(1), , "Delivered on:"
    Set conn = CreateObject("ADODB.connection")
    conn.Open "Driver={SQL Server};Server=.;Database=TEST;Trusted_Connection=True;"
    set rs = CreateObject("ADODB.recordset")
    sql = "INSERT INTO DELIVERY_STATUS VALUES('" & deliveredDate(0) & " " & deliveredDate(1)  & "','" & Now() & "','" & TrackNum & "')"
    'Msgbox SQL
    rs.Open sql, conn		
Else
    'Msgbox "Couldn't find the delivery date"
End If

Open in new window

0
 
LVL 1

Author Comment

by:k1ng87
ID: 32424710
woops...sorry about that....i have both a sql 2000 environment and a 2005 environment....which one would be easier to use?? i'm assuming the '05 environment??
0
 
LVL 6

Expert Comment

by:yogi4life
ID: 32425026
Actually - with the VBScript I justed posted it doesnt matter... So if you have little or no experience with SSIS I would think that this small script is good enough for your purposes. If you want to learn SSIS it might be better to include it in a script-task in an SSIS package. You have more facilities to schedule jobs and do a lot of other "integration" stuff - but you will have to spend some time to learn the entire environment.
0
 
LVL 1

Author Closing Comment

by:k1ng87
ID: 32789325
solution is getting fairly complex...
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to get Spreadsheet Compare 2016 working with the 64 bit version of Office 2016
This article describes a serious pitfall that can happen when deleting shapes using VBA.
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.

594 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question