Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1038
  • Last Modified:

XMLHTTP Returning 404 for valid URL in VBA

I am new to Web Scraping, but am attempting to gather some data from Google's new HotelFinder service to learn how.  When I put the following url in to my browsers, it loads correctly, but in the code below, iXHTTP.status retuns 404.  Any assistance as to why would be greatly appreciated.

Option Explicit

'GOOGLE HOTEL FINDER
'www.google.com/hotelfinder

Const cWAITTIME As String = "0:00:05"
Const NIGHTS As Integer = 1
Const RATING As Integer = 0

Public Function DownloadGHFFile(sURL As String, sPath As String) As Boolean
    'Function that accepts a search URL as sURL and destination file path as sPath
    'Returns TRUE if file is written, FALSE if file is not written
    
    
    DownloadGHFFile = False 'Default as fail
    On Error GoTo exitSub 'Handle error and return false
    
    Dim oXHTTP As Object, oStream As Object, sTime As Date
    Dim iXHTTP As MSXML2.XMLHTTP60
    
    'Create object and get page
    Set iXHTTP = New MSXML2.XMLHTTP60
    With iXHTTP
        .Open "GET", sURL, False
        .send
    End With
    
    sTime = Now 'Mark time for timer
    
    Do Until iXHTTP.Status = 200 'If takes more than wait period, cancel (200 = ready)
        Debug.Print iXHTTP.Status
        If Now > sTime + TimeValue(cWAITTIME) Or iXHTTP.Status = 404 Then GoTo exitSub
        DoEvents
    Loop
    
    'Write to file
    Set oStream = CreateObject("ADODB.Stream")
    With oStream
        .Type = 1 'adTypeBinary
        .Open
        .Write iXHTTP.responseBody
        .SaveToFile sPath, adSaveCreateOverWrite
        .Close
    End With
    
    DownloadGHFFile = True 'Mark as success
    
exitSub:
    Set oXHTTP = Nothing
    Set oStream = Nothing
End Function

Public Function createGHFQuery(City As String, State As String, sDate As Date, Nts As Integer) As String
    'Function to create url in google.com/hotelfinder format
    'Returns string containing url
    
    Dim S As String, s2 As String
    S = "http://www.google.com/hotelfinder/"
    
    S = S & "#search;"

    s2 = "l=" & LCase(City) & ",+" & LCase(State) & ";"
    S = S & s2

    s2 = "d=" & Format(sDate, "YYYY-MM-DD") & ";"
    S = S & s2

    s2 = "n=" & Nts & ";"
    S = S & s2

    s2 = "r=" & RATING
    S = S & s2

    createGHFQuery = S
    
End Function

Sub test()
    'Routine testing above functions
    
    Dim myS As String
    myS = createGHFQuery("Lahaina", "HI", CDate("10/15/2011"), NIGHTS)
    If DownloadGHFFile(myS, "C:\VBA Training\Web Source Files\googTest.txt") Then
        Call MsgBox("Success")
    Else
        Call MsgBox("Failed")
    End If
End Sub

Open in new window


Thanks.

matt
0
MPBasiliere
Asked:
MPBasiliere
  • 3
  • 2
1 Solution
 
Dave BaldwinFixer of ProblemsCommented:
Because you're not running all the javascript and the extra AJAX accesses that the page runs in a browser.  I think they have you pretty well blocked.  If you use something to look at the headers generated by that page in a browser, you will see a lot more activity than just the page load.
0
 
NorieData ProcessorCommented:
Matt

What data are you trying to get from the page?

What file are you trying to download?
0
 
MPBasiliereAuthor Commented:
Thank you for the responses.
Dave: Would posting to the form on the page be an option?  If so, how can I tell the names of the fields?

imnorie: I would like to capture hotel rating, price, etc.  As much for the understanding of how to as for the data itself.  I am putting it in to a file in order to batch multiple web requests, then I cycle through the saved files and get the data I am looking for and delete the file.  If there is a better way, I am always looking to learn.  The approach I have taken works well for the Starwoodhotels.com site, so I figured I could use the same approach elsewhere.

Thanks again.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
NorieData ProcessorCommented:
So you only want to do this with XMLHTTP?

I did come up with some simple code that automates IE to get the data you want.

It's pretty rough but I'll post it anyway.
Option Explicit

Sub GetHotelData()
Dim IE As Object
Dim doc As Object
Dim divHotel As Object
Dim divDetail As Object
Dim colDivs As Object

    Set IE = CreateObject("InternetExplorer.Application")

    With IE

        .navigate createGHFQuery("Lahaina", "HI", CDate("10/15/2011"))

        Do While .Busy: DoEvents: Loop
        Do While .readyState <> 4: DoEvents: Loop

        ' .Visible = True ' optional


        Set doc = .document

        Do While doc.readyState <> "complete": DoEvents: Loop

        Application.Wait Now() + TimeSerial(0, 0, 5)

        Do
            Set colDivs = doc.getElementsByTagName("DIV")
        Loop Until Not colDivs Is Nothing

        For Each divHotel In colDivs

            If divHotel.classname = "GL2GAIFCI3" Then

                For Each divDetail In divHotel.getElementsByTagName("DIV")

                    Select Case Trim(divDetail.classname)

                        Case "GL2GAIFCG2"
                            Debug.Print "Hotel: " & divDetail.innerText
                        Case "GL2GAIFCL2"
                            Debug.Print "Description: " & divDetail.innerText

                        Case "GL2GAIFCC2"
                            Debug.Print "Stars: " & divDetail.innerText
                        Case "GL2GAIFCK2"
                            Debug.Print "Cost: " & divDetail.innerText
                        Case "GL2GAIFCL1"
                            Debug.Print "Comparison: " & divDetail.innerText

                    End Select

                Next divDetail
                Debug.Print "---------------------------------"
            End If


        Next divHotel

    End With

    Set IE = Nothing

End Sub

Public Function createGHFQuery(City As String, State As String, sDate As Date, Optional Nts = 1, Optional Rtg = 0) As String
    'Function to create url in google.com/hotelfinder format
    'Returns string containing url
    
    Dim S As String, s2 As String
    S = "http://www.google.com/hotelfinder/"
    
    S = S & "#search;"

    s2 = "l=" & LCase(City) & ",+" & LCase(State) & ";"
    S = S & s2

    s2 = "d=" & Format(sDate, "YYYY-MM-DD") & ";"
    S = S & s2

    s2 = "n=" & Nts & ";"
    S = S & s2

    s2 = "r=" & Rtg
    S = S & s2

    createGHFQuery = S
    
End Function

Open in new window


I wrote the code in Excel and normally I would put the output onto an Excel worksheet but I wasn't sure if you were using Excel.:)

So it just goes to the immediate window.(CTRL+G)

PS Hope you don't mind, I made some changes to your function that generates the URL.:)
0
 
MPBasiliereAuthor Commented:
Thank you, imnorie.  I appreciate the help.  It's great to have two approaches; the IE instancing seems to take a little longer, but is effective.  I appreciate the help!
0
 
NorieData ProcessorCommented:
I think the Application.Wait might be slowing things down considerably.

Usually checking the status, ready state will do but I couldn't find anything that would work for this.

Without the Wait the code worked pretty fast but seemed a bit sporadic, it either worked 'perfectly' or not at all.

It might be worth changing the 5 seconds to a shorter period.

In fact I just tried 1 second and it was fine.

There's probably a few other things that can be refine.

0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now