k1ng87
asked on
pulling data from website into excel
can someone point me in the right direction to how I can pull data from a website using vba code? All I know right now is the website seems to be in HTML format and I know where the data is in the HTML code but do not know how to extract it. I've also never extracted data from a website so I have no idea where to begin (but I do have moderate knowledge in vba).
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Then you would probably be best off automating interent explorer and use regular expressions to parse the html. That probably sounds like gibbersih - see https://www.experts-exchange.com/questions/25759667/Get-Table-from-website-into-Excel-VBA.html for a recent example
If you post your weblink then I will tailor this to your need
Cheers
Dave
If you post your weblink then I will tailor this to your need
Cheers
Dave
ASKER
here is what I am trying to do...I work for a small company and we are trying to track on-time delivery from UPS....I would like to write a vba code in excel that goes through a list or tracking numbers, inputs the tracking number into this link...http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=trackingnumbergoeshere.....and pull the the delivered on date and status...i'm not sure how complex this is or even if it can be done....this is just a way I thought maybe it can be done...
thanks for looking Dave!
thanks for looking Dave!
ASKER
this example will loop through the codes in column A of sheet 1 and dump the tracking data to column A of sheet 2.
did you want to parse out the date and status or is this ok as is?
Cheers
Dave
did you want to parse out the date and status or is this ok as is?
Cheers
Dave
Sub GetWeb()
Dim objIe
Dim objRegEx
Dim RegMC
Dim RegM
Dim ws1 As Worksheet
Dim ws2 As Worksheet
Dim rng1 As Range
Dim c As Range
Dim strTemp As String
With Application
.ScreenUpdating = False
.DisplayAlerts = False
End With
Set ws1 = ActiveWorkbook.Sheets(1)
Set ws2 = ActiveWorkbook.Sheets(2)
Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
Set objIe = CreateObject("internetexplorer.application")
Set objRegEx = CreateObject("vbscript.regexp")
objRegEx.Global = True
ws2.UsedRange.ClearContents
For Each c In rng1
objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
Do While objIe.ReadyState <> 4
DoEvents
Loop
objRegEx.Pattern = "[\n\r\t]"
strTemp = objIe.Document.forms(4).innertext
strTemp = objRegEx.Replace(strTemp, vbNullString)
ws2.[a1] = strTemp
Next c
objIe.Quit
With Application
.ScreenUpdating = True
.DisplayAlerts = True
End With
End Sub
EE---Dynamics-Search-3.xls
this version parses out the delivered and date info to the second sheet
Dave
Dave
Sub GetWeb()
Dim objIe
Dim objRegEx
Dim RegMC
Dim RegM
Dim ws1 As Worksheet
Dim ws2 As Worksheet
Dim rng1 As Range
Dim c As Range
Dim strTemp As String
With Application
.ScreenUpdating = False
.DisplayAlerts = False
End With
Set ws1 = ActiveWorkbook.Sheets(1)
Set ws2 = ActiveWorkbook.Sheets(2)
Set rng1 = ws1.Range(ws1.[a1], ws1.Cells(Rows.Count, "a").End(xlUp))
Set objIe = CreateObject("internetexplorer.application")
Set objRegEx = CreateObject("vbscript.regexp")
objRegEx.Global = True
ws2.UsedRange.ClearContents
For Each c In rng1
objIe.Navigate "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & c.Value
Do While objIe.ReadyState <> 4
DoEvents
Loop
objRegEx.Pattern = "[\n\r\t]"
strTemp = objIe.Document.forms(4).innertext
strTemp = objRegEx.Replace(strTemp, vbNullString)
objRegEx.Pattern = "Status: ([A-Za-z]+)(.+?Delivered On:(.+?) Signed)?"
Set RegMC = objRegEx.Execute(strTemp)
For Each RegM In RegMC
ws2.Cells(Rows.Count, "A").End(xlUp).Offset(1, 0) = c.Value
ws2.Cells(Rows.Count, "B").End(xlUp).Offset(1, 0) = RegM.submatches(0)
ws2.Cells(Rows.Count, "C").End(xlUp).Offset(1, 0) = RegM.submatches(2)
Next
Next c
objIe.Quit
With Application
.ScreenUpdating = True
.DisplayAlerts = True
End With
End Sub
EE---Dynamics-Search-1.xls
If you want to avoid the overhead and problems with Internet Explorer you could read the web page through the WinHTTPRequest object. Here is some code which shows how to read a W3C's web page and extract the publication date and time of its top article:
Sub GetWebPage()
Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
WinHTTP.Open "GET", "http://www.w3.org", False
WinHTTP.Send
allHTML = WinHTTP.ResponseText
posPublishedDate = InStr(1, allHTML, "dtstart published")
publishedDate = Mid(allHTML, posPublishedDate + 26, 25)
MsgBox "W3C's home page had its main front page article updated on " & Chr(10) & Chr(10) & Chr(9) & publishedDate, , "Retrieved Directly from the web:"
'Application.ActiveSheet.Range("A3").Value = publishedDate
WinHTTP = Null
End Sub
ASKER
I think i'm understanding the code now but can't seem to figure out how its pulling the information from IE....can someone please explain this section of the code?
thanks!
thanks!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
To sum up so far:
Dave's excellent solution for looping through the UPS tracking numbers could be used with any of the three web page retrieve methods:
1. QueryTables (native Excel method) - as teylyn shows
2. Using Internet Explorer via automation (COM object) - as Dave suggest
3. Using the Windows operating system's COM object WinHTTPRequest (standard on all Windows installations) - as I have shown
All methods has their pros and cons. So it is entirely up to which requirements needs to be satisfied - which approach to select...
For instance has the Internet Explorer solution the benefit of being a full-rendering of the web page which could be good - but it has the disadvantage of requiring more resources. All external stylesheets and all graphics are loaded so you end up retrieving several times the amount of data that you actually need. But it has the advantage of being transferable to other enviroments which doesnt run in Excel or VBA - like other programming languages and other scripting solutions which support COM automation.
The first solution has benefits of low overhead no page rendering as standard (although it can also be done). It is a more understandable for those who havent the COM automation knowledge.
The last solution has also very low overhead and you have very high control of what the http request should or should not do. Furthermore it is usable outside of Excel and VBA enviroments too.
Dave's excellent solution for looping through the UPS tracking numbers could be used with any of the three web page retrieve methods:
1. QueryTables (native Excel method) - as teylyn shows
2. Using Internet Explorer via automation (COM object) - as Dave suggest
3. Using the Windows operating system's COM object WinHTTPRequest (standard on all Windows installations) - as I have shown
All methods has their pros and cons. So it is entirely up to which requirements needs to be satisfied - which approach to select...
For instance has the Internet Explorer solution the benefit of being a full-rendering of the web page which could be good - but it has the disadvantage of requiring more resources. All external stylesheets and all graphics are loaded so you end up retrieving several times the amount of data that you actually need. But it has the advantage of being transferable to other enviroments which doesnt run in Excel or VBA - like other programming languages and other scripting solutions which support COM automation.
The first solution has benefits of low overhead no page rendering as standard (although it can also be done). It is a more understandable for those who havent the COM automation knowledge.
The last solution has also very low overhead and you have very high control of what the http request should or should not do. Furthermore it is usable outside of Excel and VBA enviroments too.
ASKER
well...maybe you guys could help me select the appropriate one??
here is where I wanted to go with this....I wanted to pull tracking numbers on a weekly basis from sql into excel, run that macro, and then transfer the on-time metrics back up to the sql db....
here is where I wanted to go with this....I wanted to pull tracking numbers on a weekly basis from sql into excel, run that macro, and then transfer the on-time metrics back up to the sql db....
Well - all the above solutions will work - actually - but I think if I were you stick to the solution you understand best. If all seems to be hard to understand select the one you most likely will need to learn anyhow for other purposes.
If you dont have any preferences at all - then I would stick to using QueryTables method... That method is more in line with what any Excel VBA programmer should be able to support anyhow ... whereas COM object automation - is for the more advanced VBA programmers...
If you dont have any preferences at all - then I would stick to using QueryTables method... That method is more in line with what any Excel VBA programmer should be able to support anyhow ... whereas COM object automation - is for the more advanced VBA programmers...
But - If you need code that you can cut & paste most quickly right now - I think Dave has made a very good effort to give you a complete turn-key solution.
Hmm, why would you want to use Excel anyway then? If you pull from SQL and put it back into SQL... Do you have any reporting needs with Excel that must be covered?
ASKER
umm.....not really....I'm assuming right now that I can't run something in SQL (sql server '00) to do wat the vba code is doing??
It is a number of ways to do this without Excel even with SQL Server 2000. You could make a SQL Server DTS (Data Transformation Services - which is included with SQL Server 2000) task which can use a VBScript which can utlized the WinHTTPRequest COM object to retrieve the HTML and UPS information and then post the results immediately back to a SQL Server table...
You will have the added benefit of being able to schedule it in DTS as a job - so it can run weekly there with a lot of options for error reporting and tracking if anything goes wrong...
You will have the added benefit of being able to schedule it in DTS as a job - so it can run weekly there with a lot of options for error reporting and tracking if anything goes wrong...
If going for SQL Server 2000 DTS is a bit much for your current requirement you could simply use the free Windows version of CURL which is a command-line tool to retrieve among other things HTML pages. And you could simply use AWK (another superb free command-line tool) to extract your information. And then you could schedule this once a week with normal Windows scheduling...
I could provide you an example - but first I have to do some other business... I will post you a sample later this evening (GMT+1) ...
- Petter
I could provide you an example - but first I have to do some other business... I will post you a sample later this evening (GMT+1) ...
- Petter
Just a short example of CURL retrieving a web-page into a local file... I will post a comprehensive code later...
C:\>curl "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=1ZV4W8030317198872" >html-text.txt
C:\>curl "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=1ZV4W8030317198872" >html-text.txt
Unsubscribing as I think that the orginal question, plus follow-ups have been covered off.
Thanks Petter for your kind comments above re my code.
Regards
Dave
Thanks Petter for your kind comments above re my code.
Regards
Dave
Using CURL is a bit more comprehensive actually - since UPS wants you to register and use their API.
Which is quite sensible - because then you can get the data in a predefined format like CSV or XML and you will NOT need to parse possibly changing and unreliable HTML code. You will get a much more reliable, low-maintenace and robust solution.
Registering is for free and you get a userid and a password. This can be used in a POST (via CURL - commandline or WinHTTPRequest via VBScript) with an upload file full of tracking-numbers and you will get one file back with the status for all tracking-numbers. All without any HTML...
I would definitely recommend you to go in that direction - but if you are in a hurry - just stick to option 1.
I could help you with the VBScript / WinHTTPRequest and scheduling - but then you need to credit this first and post a new question - since this has nothing to do with Excel...
Regards
Petter
Which is quite sensible - because then you can get the data in a predefined format like CSV or XML and you will NOT need to parse possibly changing and unreliable HTML code. You will get a much more reliable, low-maintenace and robust solution.
Registering is for free and you get a userid and a password. This can be used in a POST (via CURL - commandline or WinHTTPRequest via VBScript) with an upload file full of tracking-numbers and you will get one file back with the status for all tracking-numbers. All without any HTML...
I would definitely recommend you to go in that direction - but if you are in a hurry - just stick to option 1.
I could help you with the VBScript / WinHTTPRequest and scheduling - but then you need to credit this first and post a new question - since this has nothing to do with Excel...
Regards
Petter
ASKER
wow...i've been away from this thread for awhile now....i got pulled into some other projects and this kinda got left in the back burner....
brettdj....your solution is very helpful....but I think this is going to get more complicated as I was originally thinking of going from sql to excel(runs the code) then back to sql...is there away this can be done in sql?? or would that just get waayyy tooo complex??
brettdj....your solution is very helpful....but I think this is going to get more complicated as I was originally thinking of going from sql to excel(runs the code) then back to sql...is there away this can be done in sql?? or would that just get waayyy tooo complex??
Could you tell the specifics of what SQL you are using? What version and which vendor?
Is i Microsoft SQL Server 2005? 2008? 2000? Do you have SSIS running on the SQL Server?
It is pure guess-work to give an answer if you dont tell us what your run-time environment offers and what privileges you have to configure and reconfigure. For instance you could enable xp_cmdshell in SQL Server and configure a proxy account to use it then you could solve your problem with a few lines of code more or less directly from SQL.
Is i Microsoft SQL Server 2005? 2008? 2000? Do you have SSIS running on the SQL Server?
It is pure guess-work to give an answer if you dont tell us what your run-time environment offers and what privileges you have to configure and reconfigure. For instance you could enable xp_cmdshell in SQL Server and configure a proxy account to use it then you could solve your problem with a few lines of code more or less directly from SQL.
You could actually use a VBScript to download the page from the web and then connect to SQL Server and insert the information into the database just like the attached code.
You will need at table in the database called DELIVERY_STATUS with three columns which are all VARCHAR(50)... or you will have to modify the code accordingly. The connection to the SQL Server is done via the logged on user in Windows (trusted connection) and the script as it is require it to run on the SQL Server machine... But that could be changed easly.
Save the attached code in a file like RetrieveDelivery.vbs and schedule it to run on a regular interval. It could be extended to read a list of tracking numbers in a database and update the delivery status regularly of course.
You will need at table in the database called DELIVERY_STATUS with three columns which are all VARCHAR(50)... or you will have to modify the code accordingly. The connection to the SQL Server is done via the logged on user in Windows (trusted connection) and the script as it is require it to run on the SQL Server machine... But that could be changed easly.
Save the attached code in a file like RetrieveDelivery.vbs and schedule it to run on a regular interval. It could be extended to read a list of tracking numbers in a database and update the delivery status regularly of course.
TrackNum = "1ZV4W8030317198872"
Set WinHTTP = CreateObject("WinHTTP.WinHTTPRequest.5.1")
URL = "http://wwwapps.ups.com/WebTracking/processInputRequest?TypeOfInquiryNumber=T&InquiryNumber1=" & TrackNum
WinHTTP.Open "GET", URL, False
WinHTTP.Send
allHTML = WinHTTP.ResponseText
WinHTTP = Null
posDeliveredLabel = InStr(1, allHTML, "<dt><label>Delivered On:</label></dt>")
If posDeliveredLabel > 0 Then
posDeliveredDate = Instr(posDeliveredLabel, allHTML , "<dd>")
posDeliveredDateEnd = Instr(posDeliveredDate, allHTML , "</dd>")
deliveredDate = Split(Mid(allHTML, posDeliveredDate + 5, posDeliveredDateEnd-posDeliveredDate-5),chr(10))
deliveredDate(0) = Trim(deliveredDate(0))
deliveredDate(1) = Trim(deliveredDate(1))
'MsgBox deliveredDate(0) & " " & deliveredDate(1), , "Delivered on:"
Set conn = CreateObject("ADODB.connection")
conn.Open "Driver={SQL Server};Server=.;Database=TEST;Trusted_Connection=True;"
set rs = CreateObject("ADODB.recordset")
sql = "INSERT INTO DELIVERY_STATUS VALUES('" & deliveredDate(0) & " " & deliveredDate(1) & "','" & Now() & "','" & TrackNum & "')"
'Msgbox SQL
rs.Open sql, conn
Else
'Msgbox "Couldn't find the delivery date"
End If
ASKER
woops...sorry about that....i have both a sql 2000 environment and a 2005 environment....which one would be easier to use?? i'm assuming the '05 environment??
Actually - with the VBScript I justed posted it doesnt matter... So if you have little or no experience with SSIS I would think that this small script is good enough for your purposes. If you want to learn SSIS it might be better to include it in a script-task in an SSIS package. You have more facilities to schedule jobs and do a lot of other "integration" stuff - but you will have to spend some time to learn the entire environment.
ASKER
solution is getting fairly complex...
ASKER