Link to home
Start Free TrialLog in
Avatar of k1ng87
k1ng87

asked on

pulling data from website into excel

can someone point me in the right direction to how I can pull data from a website using vba code? All I know right now is the website seems to be in HTML format and I know where the data is in the HTML code but do not know how to extract it. I've also never extracted data from a website so I have no idea where to begin (but I do have moderate knowledge in vba).
SOLUTION
Avatar of Ingeborg Hawighorst (Microsoft MVP / EE MVE)
Ingeborg Hawighorst (Microsoft MVP / EE MVE)
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of k1ng87
k1ng87

ASKER

what if its not a table? Like what if I want to pull some heading (<h></h>) title a certain spot in the HTML code??
Then you would probably be best off automating interent explorer and use regular expressions to parse the html. That probably sounds like gibbersih - see https://www.experts-exchange.com/questions/25759667/Get-Table-from-website-into-Excel-VBA.html for a recent example
If you post your weblink then I will tailor this to your need
Cheers
Dave
Avatar of k1ng87

ASKER

here is what I am trying to do...I work for a small company and we are trying to track on-time delivery from UPS....I would like to write a vba code in excel that goes through a list or tracking numbers, inputs the tracking number into this link...http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=trackingnumbergoeshere.....and pull the the delivered on date and status...i'm not sure how complex this is or even if it can be done....this is just a way I thought  maybe it can be done...

thanks for looking Dave!
this example will loop through the codes in column A of sheet 1 and dump the tracking data to column A of sheet 2.

did you want to parse out the date and status or is this ok as is?

Cheers
Dave

Sub GetWeb()
    Dim objIe
    Dim objRegEx
    Dim RegMC
    Dim RegM
    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim rng1 As Range
    Dim c As Range
    Dim strTemp As String

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With

    Set ws1 = ActiveWorkbook.Sheets(1)
    Set ws2 = ActiveWorkbook.Sheets(2)
    Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
    Set objIe = CreateObject("internetexplorer.application")
    Set objRegEx = CreateObject("vbscript.regexp")

    objRegEx.Global = True

    ws2.UsedRange.ClearContents
    For Each c In rng1
        objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
        Do While objIe.ReadyState <> 4
            DoEvents
        Loop
        objRegEx.Pattern = "[\n\r\t]"
        strTemp = objIe.Document.forms(4).innertext
        strTemp = objRegEx.Replace(strTemp, vbNullString)
        ws2.[a1] = strTemp
    Next c
    objIe.Quit
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
End Sub

Open in new window

EE---Dynamics-Search-3.xls
this version parses out the delivered and date info to the second sheet

Dave

Sub GetWeb()
    Dim objIe
    Dim objRegEx
    Dim RegMC
    Dim RegM
    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim rng1 As Range
    Dim c As Range
    Dim strTemp As String

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With

    Set ws1 = ActiveWorkbook.Sheets(1)
    Set ws2 = ActiveWorkbook.Sheets(2)
    Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
    Set objIe = CreateObject("internetexplorer.application")
    Set objRegEx = CreateObject("vbscript.regexp")

    objRegEx.Global = True

    ws2.UsedRange.ClearContents
    For Each c In rng1
        objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
        Do While objIe.ReadyState <> 4
            DoEvents
        Loop
        objRegEx.Pattern = "[\n\r\t]"
        strTemp = objIe.Document.forms(4).innertext
        strTemp = objRegEx.Replace(strTemp, vbNullString)
        objRegEx.Pattern = "Status: ([A-Za-z]+)(.+?Delivered On:(.+?) Signed)?"
        Set RegMC = objRegEx.Execute(strTemp)
        For Each RegM In RegMC
            ws2.Cells(Rows.Count, "A").End(xlUp).Offset(1, 0) = c.Value
            ws2.Cells(Rows.Count, "B").End(xlUp).Offset(1, 0) = RegM.submatches(0)
            ws2.Cells(Rows.Count, "C").End(xlUp).Offset(1, 0) = RegM.submatches(2)
        Next
    Next c
    objIe.Quit
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
End Sub

Open in new window

EE---Dynamics-Search-1.xls
If you want to avoid the overhead and problems with Internet Explorer you could read the web page through the WinHTTPRequest object. Here is some code which shows how to read a W3C's web page and extract the publication date and time of its top article:


Sub GetWebPage()
    Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
    WinHTTP.Open "GET", "http://www.w3.org", False
    WinHTTP.Send
    allHTML = WinHTTP.ResponseText
    posPublishedDate = InStr(1, allHTML, "dtstart published")
    publishedDate = Mid(allHTML, posPublishedDate + 26, 25)
    MsgBox "W3C's home page had its main front page article updated on " & Chr(10) & Chr(10) & Chr(9) & publishedDate, , "Retrieved Directly from the web:"
    'Application.ActiveSheet.Range("A3").Value = publishedDate
    WinHTTP = Null
End Sub

Open in new window

Avatar of k1ng87

ASKER

I think i'm understanding the code now but can't seem to figure out how its pulling the information from IE....can someone please explain this section of the code?

thanks!
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
To sum up so far:

Dave's excellent solution for looping through the UPS tracking numbers could be used with any of the three web page retrieve methods:

1. QueryTables (native Excel method) - as teylyn shows
2. Using Internet Explorer via automation (COM object) - as Dave suggest
3. Using the Windows operating system's COM object WinHTTPRequest (standard on all Windows installations) - as I have shown

All methods has their pros and cons. So it is entirely up to which requirements needs to be satisfied - which approach to select...

For instance has the Internet Explorer solution the benefit of being a full-rendering of the web page which could be good - but it has the disadvantage of requiring more resources. All external stylesheets and all graphics are loaded so you end up retrieving several times the amount of data that you actually need. But it has the advantage of being transferable to other enviroments which doesnt run in Excel or VBA - like other programming languages and other scripting solutions which support COM automation.

The first solution has benefits of low overhead no page rendering as standard (although it can also be done). It is a more understandable for those who havent the COM automation knowledge.

The last solution has also very low overhead and you have very high control of what the http request should or should not do. Furthermore it is usable outside of Excel and VBA enviroments too.
Avatar of k1ng87

ASKER

well...maybe you guys could help me select the appropriate one??
here is where I wanted to go with this....I wanted to pull tracking numbers on a weekly basis from sql into excel, run that macro, and then transfer the on-time metrics back up to the sql db....
Well - all the above solutions will work - actually - but I think if I were you stick to the solution you understand best. If all seems to be hard to understand select the one you most likely will need to learn anyhow for other purposes.

If you dont have any preferences at all - then I would stick to using QueryTables method... That method is more in line with what any Excel VBA programmer should be able to support anyhow ... whereas COM object automation - is for the more advanced VBA programmers...

But - If you need code that you can cut & paste most quickly right now - I think Dave has made a very good effort to give you a complete turn-key solution.
Hmm, why would you want to use Excel anyway then? If you pull from SQL and put it back into SQL... Do you have any reporting needs with Excel that must be covered?
Avatar of k1ng87

ASKER

umm.....not really....I'm assuming right now that I can't run something in SQL (sql server '00) to do wat the vba code is doing??
It is a number of ways to do this without Excel even with SQL Server 2000. You could make a SQL Server DTS (Data Transformation Services - which is included with SQL Server 2000) task which can use a VBScript which can utlized the WinHTTPRequest COM object to retrieve the HTML and UPS information and then post the results immediately back to a SQL Server table...

You will have the added benefit of being able to schedule it in DTS as a job - so it can run weekly there with a lot of options for error reporting and tracking if anything goes wrong...

If going for SQL Server 2000 DTS is a bit much for your current requirement you could simply use the free Windows version of CURL which is a command-line tool to retrieve among other things HTML pages. And you could simply use AWK (another superb free command-line tool) to extract your information. And then you could schedule this once a week with normal Windows scheduling...

I could provide you an example - but first I have to do some other business... I will post you a sample later this evening (GMT+1) ...

- Petter
Just a short example of CURL retrieving a web-page into a local file... I will post a comprehensive code later...

C:\>curl "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=1ZV4W8030317198872" >html-text.txt
Unsubscribing as I think that the orginal question, plus follow-ups have been covered off.

Thanks Petter for your kind comments above re my code.

Regards

Dave
Using CURL is a bit more comprehensive actually - since UPS wants you to register and use their API.

Which is quite sensible - because then you can get the data in a predefined format like CSV or XML and you will NOT need to parse possibly changing and unreliable HTML code. You will get a much more reliable, low-maintenace and robust solution.

Registering is for free and you get a userid and a password. This can be used in a POST (via CURL - commandline or WinHTTPRequest via VBScript) with an upload file full of tracking-numbers and you will get one file back with the status for all tracking-numbers. All without any HTML...

I would definitely recommend you to go in that direction - but if you are in a hurry - just stick to option 1.
I could help you with the VBScript / WinHTTPRequest and scheduling - but then you need to credit this first and post a new question - since this has nothing to do with Excel...

Regards
Petter
Avatar of k1ng87

ASKER

wow...i've been away from this thread for awhile now....i got pulled into some other projects and this kinda got left in the back burner....

brettdj....your solution is very helpful....but I think this is going to get more complicated as I was originally thinking of going from sql to excel(runs the code) then back to sql...is there away this can be done in sql?? or would that just get waayyy tooo complex??
Could you tell the specifics of what SQL you are using? What version and which vendor?

Is i Microsoft SQL Server 2005? 2008? 2000? Do you have SSIS running on the SQL Server?

It is pure guess-work to give an answer if you dont tell us what your run-time environment offers and what privileges you have to configure and reconfigure. For instance you could enable xp_cmdshell in SQL Server and configure a proxy account to use it then you could solve your problem with a few lines of code more or less directly from SQL.





You could actually use a VBScript to download the page from the web and then connect to SQL Server and insert the information into the database just like the attached code.

You will need at table in the database called DELIVERY_STATUS with three columns which are all VARCHAR(50)... or you will have to modify the code accordingly. The connection to the SQL Server is done via the logged on user in Windows (trusted connection) and the script as it is require it to run on the SQL Server machine... But that could be changed easly.

Save the attached code in a file like RetrieveDelivery.vbs and schedule it to run on a regular interval. It could be extended to read a list of tracking numbers in a database and update the delivery status regularly of course.
TrackNum = "1ZV4W8030317198872"
Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
URL = "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & TrackNum
WinHTTP.Open "GET", URL, False
WinHTTP.Send
allHTML = WinHTTP.ResponseText
WinHTTP = Null
posDeliveredLabel = InStr(1, allHTML, "<dt><label>Delivered On:</label></dt>")
If posDeliveredLabel > 0 Then
    posDeliveredDate = Instr(posDeliveredLabel, allHTML , "<dd>")
    posDeliveredDateEnd = Instr(posDeliveredDate, allHTML , "</dd>")
    deliveredDate = Split(Mid(allHTML, posDeliveredDate + 5, posDeliveredDateEnd-posDeliveredDate-5),chr(10))
    deliveredDate(0) = Trim(deliveredDate(0))
    deliveredDate(1) = Trim(deliveredDate(1))
    'MsgBox deliveredDate(0) & " " & deliveredDate(1), , "Delivered on:"
    Set conn = CreateObject("ADODB.connection")
    conn.Open "Driver={SQL Server};Server=.;Database=TEST;Trusted_Connection=True;"
    set rs = CreateObject("ADODB.recordset")
    sql = "INSERT INTO DELIVERY_STATUS VALUES('" & deliveredDate(0) & " " & deliveredDate(1)  & "','" & Now() & "','" & TrackNum & "')"
    'Msgbox SQL
    rs.Open sql, conn		
Else
    'Msgbox "Couldn't find the delivery date"
End If

Open in new window

Avatar of k1ng87

ASKER

woops...sorry about that....i have both a sql 2000 environment and a 2005 environment....which one would be easier to use?? i'm assuming the '05 environment??
Actually - with the VBScript I justed posted it doesnt matter... So if you have little or no experience with SSIS I would think that this small script is good enough for your purposes. If you want to learn SSIS it might be better to include it in a script-task in an SSIS package. You have more facilities to schedule jobs and do a lot of other "integration" stuff - but you will have to spend some time to learn the entire environment.
Avatar of k1ng87

ASKER

solution is getting fairly complex...