pulling data from website into excel

can someone point me in the right direction to how I can pull data from a website using vba code? All I know right now is the website seems to be in HTML format and I know where the data is in the HTML code but do not know how to extract it. I've also never extracted data from a website so I have no idea where to begin (but I do have moderate knowledge in vba).
LVL 1
k1ng87Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ingeborg Hawighorst (Microsoft MVP / EE MVE)Microsoft MVP ExcelCommented:
Hi K1ng87,

do you want the data to be a dynamic query? If so, you can use code along these lines as a starting point.

cheers,  teylyn
    With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;http://www.example.com/folder/page.htm", Destination:=Range("$A$1"))
        .Name = "index"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
        .WebTables = "8" ' this is a table in the HTML code
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With

Open in new window

0
k1ng87Author Commented:
what if its not a table? Like what if I want to pull some heading (<h></h>) title a certain spot in the HTML code??
0
DaveCommented:
Then you would probably be best off automating interent explorer and use regular expressions to parse the html. That probably sounds like gibbersih - see http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_25759667.html for a recent example
If you post your weblink then I will tailor this to your need
Cheers
Dave
0
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

k1ng87Author Commented:
here is what I am trying to do...I work for a small company and we are trying to track on-time delivery from UPS....I would like to write a vba code in excel that goes through a list or tracking numbers, inputs the tracking number into this link...http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=trackingnumbergoeshere.....and pull the the delivered on date and status...i'm not sure how complex this is or even if it can be done....this is just a way I thought  maybe it can be done...

thanks for looking Dave!
0
DaveCommented:
this example will loop through the codes in column A of sheet 1 and dump the tracking data to column A of sheet 2.

did you want to parse out the date and status or is this ok as is?

Cheers
Dave

Sub GetWeb()
    Dim objIe
    Dim objRegEx
    Dim RegMC
    Dim RegM
    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim rng1 As Range
    Dim c As Range
    Dim strTemp As String

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With

    Set ws1 = ActiveWorkbook.Sheets(1)
    Set ws2 = ActiveWorkbook.Sheets(2)
    Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
    Set objIe = CreateObject("internetexplorer.application")
    Set objRegEx = CreateObject("vbscript.regexp")

    objRegEx.Global = True

    ws2.UsedRange.ClearContents
    For Each c In rng1
        objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
        Do While objIe.ReadyState <> 4
            DoEvents
        Loop
        objRegEx.Pattern = "[\n\r\t]"
        strTemp = objIe.Document.forms(4).innertext
        strTemp = objRegEx.Replace(strTemp, vbNullString)
        ws2.[a1] = strTemp
    Next c
    objIe.Quit
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
End Sub

Open in new window

EE---Dynamics-Search-3.xls
0
DaveCommented:
this version parses out the delivered and date info to the second sheet

Dave

Sub GetWeb()
    Dim objIe
    Dim objRegEx
    Dim RegMC
    Dim RegM
    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim rng1 As Range
    Dim c As Range
    Dim strTemp As String

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With

    Set ws1 = ActiveWorkbook.Sheets(1)
    Set ws2 = ActiveWorkbook.Sheets(2)
    Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
    Set objIe = CreateObject("internetexplorer.application")
    Set objRegEx = CreateObject("vbscript.regexp")

    objRegEx.Global = True

    ws2.UsedRange.ClearContents
    For Each c In rng1
        objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
        Do While objIe.ReadyState <> 4
            DoEvents
        Loop
        objRegEx.Pattern = "[\n\r\t]"
        strTemp = objIe.Document.forms(4).innertext
        strTemp = objRegEx.Replace(strTemp, vbNullString)
        objRegEx.Pattern = "Status: ([A-Za-z]+)(.+?Delivered On:(.+?) Signed)?"
        Set RegMC = objRegEx.Execute(strTemp)
        For Each RegM In RegMC
            ws2.Cells(Rows.Count, "A").End(xlUp).Offset(1, 0) = c.Value
            ws2.Cells(Rows.Count, "B").End(xlUp).Offset(1, 0) = RegM.submatches(0)
            ws2.Cells(Rows.Count, "C").End(xlUp).Offset(1, 0) = RegM.submatches(2)
        Next
    Next c
    objIe.Quit
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
End Sub

Open in new window

EE---Dynamics-Search-1.xls
0
yogi4lifeCommented:
If you want to avoid the overhead and problems with Internet Explorer you could read the web page through the WinHTTPRequest object. Here is some code which shows how to read a W3C's web page and extract the publication date and time of its top article:


Sub GetWebPage()
    Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
    WinHTTP.Open "GET", "http://www.w3.org", False
    WinHTTP.Send
    allHTML = WinHTTP.ResponseText
    posPublishedDate = InStr(1, allHTML, "dtstart published")
    publishedDate = Mid(allHTML, posPublishedDate + 26, 25)
    MsgBox "W3C's home page had its main front page article updated on " & Chr(10) & Chr(10) & Chr(9) & publishedDate, , "Retrieved Directly from the web:"
    'Application.ActiveSheet.Range("A3").Value = publishedDate
    WinHTTP = Null
End Sub

Open in new window

0
k1ng87Author Commented:
I think i'm understanding the code now but can't seem to figure out how its pulling the information from IE....can someone please explain this section of the code?

thanks!
0
DaveCommented:
If you debug the code then you can see that the IE instance has a number ofr Forms under
objIe.Document
The 5th item contains the delivery data that you want (it is 4 in the code as the items start from 0)

A regular expression is used to parse the string to extract the text immediately after "Status: " and "Delievered On:"
objRegEx.Pattern = "Status: ([A-Za-z]+)(.+?Delivered On:(.+?) Signed)?"

See Patrick Matthews excellent article on regular expressions at   http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html - it is the method to use when parsing text

Cheers

Dave

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
yogi4lifeCommented:
k1ng87,

Contrary to what you thought - the suggestion of teylyn (the first comment to your question) was actually a feasable solution.

a) The QueryTables supports retrieving web pages without tables - although due to the name it is a bit counter-intuitive.
     (Just leave out the .WebTables parameter and change the .WebSelectionType parameter xlSpecifiedTables into xlEntirePage)
b) The UPS web site with tracking information actually wraps the information into a table.
    (So you could even use the standard setting for xlQueryTables...)

If you prefer - you could actually retrieve the information without the reliance on anything outside of Excel - if that is a priority for your needs or your solution.

The following VBA macro requires that you have a UPS tracking number in the current cell of the active worksheet when you start the macro. You could take this as a starting point and use rather looping through a number of tracking numbers just as the code from Dave suggest.


Sub RetrieveWebPageOrTable()
    Dim ws As Worksheet
    Dim rs As Worksheet
    Set ws = ActiveSheet
    tn = Application.ActiveCell.Value
    Sheets.Add After:=Sheets(1), Type:=xlWorksheet
    Sheets(2).Name = tn
    Sheets(tn).Select
    Set rs = Sheets(tn)
    conn = "URL;http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & tn
    rs.Range("A1").Formula = "=INDEX(B:B,MATCH(""Status:"",B:B,0)+1)"
    rs.Range("A2").Formula = "=INDEX(B:B,MATCH(""Delivered On:"",B:B,0)+1)"
    rs.Range("B1").Value = "UPS Web Page info:"
    rs.Range("B1").Font.Bold = True
    With rs.QueryTables.Add(conn, Range("B2"))
        .Name = tn
        .AdjustColumnWidth = True
        .WebDisableDateRecognition = False
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .Refresh BackgroundQuery:=False
    End With
    rs.Range("A:A").ColumnWidth = "30"
End Sub

Open in new window

0
yogi4lifeCommented:
To sum up so far:

Dave's excellent solution for looping through the UPS tracking numbers could be used with any of the three web page retrieve methods:

1. QueryTables (native Excel method) - as teylyn shows
2. Using Internet Explorer via automation (COM object) - as Dave suggest
3. Using the Windows operating system's COM object WinHTTPRequest (standard on all Windows installations) - as I have shown

All methods has their pros and cons. So it is entirely up to which requirements needs to be satisfied - which approach to select...

For instance has the Internet Explorer solution the benefit of being a full-rendering of the web page which could be good - but it has the disadvantage of requiring more resources. All external stylesheets and all graphics are loaded so you end up retrieving several times the amount of data that you actually need. But it has the advantage of being transferable to other enviroments which doesnt run in Excel or VBA - like other programming languages and other scripting solutions which support COM automation.

The first solution has benefits of low overhead no page rendering as standard (although it can also be done). It is a more understandable for those who havent the COM automation knowledge.

The last solution has also very low overhead and you have very high control of what the http request should or should not do. Furthermore it is usable outside of Excel and VBA enviroments too.
0
k1ng87Author Commented:
well...maybe you guys could help me select the appropriate one??
here is where I wanted to go with this....I wanted to pull tracking numbers on a weekly basis from sql into excel, run that macro, and then transfer the on-time metrics back up to the sql db....
0
yogi4lifeCommented:
Well - all the above solutions will work - actually - but I think if I were you stick to the solution you understand best. If all seems to be hard to understand select the one you most likely will need to learn anyhow for other purposes.

If you dont have any preferences at all - then I would stick to using QueryTables method... That method is more in line with what any Excel VBA programmer should be able to support anyhow ... whereas COM object automation - is for the more advanced VBA programmers...

0
yogi4lifeCommented:
But - If you need code that you can cut & paste most quickly right now - I think Dave has made a very good effort to give you a complete turn-key solution.
0
yogi4lifeCommented:
Hmm, why would you want to use Excel anyway then? If you pull from SQL and put it back into SQL... Do you have any reporting needs with Excel that must be covered?
0
k1ng87Author Commented:
umm.....not really....I'm assuming right now that I can't run something in SQL (sql server '00) to do wat the vba code is doing??
0
yogi4lifeCommented:
It is a number of ways to do this without Excel even with SQL Server 2000. You could make a SQL Server DTS (Data Transformation Services - which is included with SQL Server 2000) task which can use a VBScript which can utlized the WinHTTPRequest COM object to retrieve the HTML and UPS information and then post the results immediately back to a SQL Server table...

You will have the added benefit of being able to schedule it in DTS as a job - so it can run weekly there with a lot of options for error reporting and tracking if anything goes wrong...

0
yogi4lifeCommented:
If going for SQL Server 2000 DTS is a bit much for your current requirement you could simply use the free Windows version of CURL which is a command-line tool to retrieve among other things HTML pages. And you could simply use AWK (another superb free command-line tool) to extract your information. And then you could schedule this once a week with normal Windows scheduling...

I could provide you an example - but first I have to do some other business... I will post you a sample later this evening (GMT+1) ...

- Petter
0
yogi4lifeCommented:
Just a short example of CURL retrieving a web-page into a local file... I will post a comprehensive code later...

C:\>curl "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=1ZV4W8030317198872" >html-text.txt
0
DaveCommented:
Unsubscribing as I think that the orginal question, plus follow-ups have been covered off.

Thanks Petter for your kind comments above re my code.

Regards

Dave
0
yogi4lifeCommented:
Using CURL is a bit more comprehensive actually - since UPS wants you to register and use their API.

Which is quite sensible - because then you can get the data in a predefined format like CSV or XML and you will NOT need to parse possibly changing and unreliable HTML code. You will get a much more reliable, low-maintenace and robust solution.

Registering is for free and you get a userid and a password. This can be used in a POST (via CURL - commandline or WinHTTPRequest via VBScript) with an upload file full of tracking-numbers and you will get one file back with the status for all tracking-numbers. All without any HTML...

I would definitely recommend you to go in that direction - but if you are in a hurry - just stick to option 1.
I could help you with the VBScript / WinHTTPRequest and scheduling - but then you need to credit this first and post a new question - since this has nothing to do with Excel...

Regards
Petter
0
k1ng87Author Commented:
wow...i've been away from this thread for awhile now....i got pulled into some other projects and this kinda got left in the back burner....

brettdj....your solution is very helpful....but I think this is going to get more complicated as I was originally thinking of going from sql to excel(runs the code) then back to sql...is there away this can be done in sql?? or would that just get waayyy tooo complex??
0
yogi4lifeCommented:
Could you tell the specifics of what SQL you are using? What version and which vendor?

Is i Microsoft SQL Server 2005? 2008? 2000? Do you have SSIS running on the SQL Server?

It is pure guess-work to give an answer if you dont tell us what your run-time environment offers and what privileges you have to configure and reconfigure. For instance you could enable xp_cmdshell in SQL Server and configure a proxy account to use it then you could solve your problem with a few lines of code more or less directly from SQL.





0
yogi4lifeCommented:
You could actually use a VBScript to download the page from the web and then connect to SQL Server and insert the information into the database just like the attached code.

You will need at table in the database called DELIVERY_STATUS with three columns which are all VARCHAR(50)... or you will have to modify the code accordingly. The connection to the SQL Server is done via the logged on user in Windows (trusted connection) and the script as it is require it to run on the SQL Server machine... But that could be changed easly.

Save the attached code in a file like RetrieveDelivery.vbs and schedule it to run on a regular interval. It could be extended to read a list of tracking numbers in a database and update the delivery status regularly of course.
TrackNum = "1ZV4W8030317198872"
Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
URL = "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & TrackNum
WinHTTP.Open "GET", URL, False
WinHTTP.Send
allHTML = WinHTTP.ResponseText
WinHTTP = Null
posDeliveredLabel = InStr(1, allHTML, "<dt><label>Delivered On:</label></dt>")
If posDeliveredLabel > 0 Then
    posDeliveredDate = Instr(posDeliveredLabel, allHTML , "<dd>")
    posDeliveredDateEnd = Instr(posDeliveredDate, allHTML , "</dd>")
    deliveredDate = Split(Mid(allHTML, posDeliveredDate + 5, posDeliveredDateEnd-posDeliveredDate-5),chr(10))
    deliveredDate(0) = Trim(deliveredDate(0))
    deliveredDate(1) = Trim(deliveredDate(1))
    'MsgBox deliveredDate(0) & " " & deliveredDate(1), , "Delivered on:"
    Set conn = CreateObject("ADODB.connection")
    conn.Open "Driver={SQL Server};Server=.;Database=TEST;Trusted_Connection=True;"
    set rs = CreateObject("ADODB.recordset")
    sql = "INSERT INTO DELIVERY_STATUS VALUES('" & deliveredDate(0) & " " & deliveredDate(1)  & "','" & Now() & "','" & TrackNum & "')"
    'Msgbox SQL
    rs.Open sql, conn		
Else
    'Msgbox "Couldn't find the delivery date"
End If

Open in new window

0
k1ng87Author Commented:
woops...sorry about that....i have both a sql 2000 environment and a 2005 environment....which one would be easier to use?? i'm assuming the '05 environment??
0
yogi4lifeCommented:
Actually - with the VBScript I justed posted it doesnt matter... So if you have little or no experience with SSIS I would think that this small script is good enough for your purposes. If you want to learn SSIS it might be better to include it in a script-task in an SSIS package. You have more facilities to schedule jobs and do a lot of other "integration" stuff - but you will have to spend some time to learn the entire environment.
0
k1ng87Author Commented:
solution is getting fairly complex...
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.