We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

Inet.OpenURL only returns part of the page!!!

Medium Priority
4,048 Views
Last Modified: 2013-11-13
This is probably a simple question, but its late and I'm too tired to deal with it.  (Plus I've got points to burn)

I'm trying to retrieve a page using the Inet control, however, it does not return the entire page, only a fraction of it.  For example:

page = Inet1.OpenURL("http://www.msn.com/")

This returns only:

<html><head><base href="http://g.msn.com/0US!s5.31472_315529/" /> ....other stuff.... <a href="73.a5539/2??cm=LeftNav8">Tec

And that's precisely where it ends.  I've tried using Winsock to do this, and I've had the most success with it, but for some reason its tacking random strings of three/four characters onto the beginning of its data chunck.  But anyways, I'm rambling...

Thanks for your time, guys!

Comment
Watch Question

Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview
sorry, change:

Private Sub Winsock1_DataArrival()

to

Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
hmm.. actually, msn seems to not like connection from winsock ...if you're not trying to do msn, it should work
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview
msg = "GET http://www.msn.com HTTP/1.0" + VbCrLf <--- your problem, just make it "GET / HTTP1/.0" & vbcrlf & vbcrlf

Commented:
If memory serves me right, that part's correct. CrLf seperates each field in the header and an additional CrLf will denote the end of the header. So with that said, the below should be the problem:

>>msg = msg + "Host: www.msn.com" + vbcrlf
>>msg = msg + vbcrlf + vbcrlf

You'll end up having 3 CrLfs (instead of 2) which the server probably won't accept. Also, after reviewing the differences in protocol 1.0 and 1.1, if the Inet control is implementing 1.0 (older version), it may have issues with keeping a persistent connection during requests.

RFC for HTTP/1.1 if you decide to go the Winsock way:
ftp://ftp.isi.edu/in-notes/rfc2616.txt
zzzzzzoc, you need 3 at the end.. i've made many, many programs with winsock..
and Brian, doing GET http://....the site, s the same as doing GET / HTTP/1.0
oh.. brian.. i was thinking of using proxies to connect.. my mistake, i am sorry  :(

Author

Commented:
Thank you all for your time!  I have extensive experience with winsock, but in this particular instance something strange is happening.

What I am actually trying to do is connect up with experts-exchange and retrieve questions, much like QuickPost.  What is strange is that when the Winsock1_DataArrival event fires, sometimes the snippets of HTML bring in some sort of little header, like "EC4" or "FD8" or "SC5" - I really dont see any pattern other than there's always three characters, but I don't think there's always this header.

So, instead of using Winsock, this time I decided to go with the Inet control because I thought it would be simpler.  However, that does not appear to be the case.  Unless someone can explain why I would have this 3 character header, I think I'll go with URLDownloadToFile and see how that performs.

FYI, when using Winsock, you usually have to provide the full request header in order for the server to return a reply, and you definately need the two vbCrlfs after the header.  If you'd like to reproduce my scenario with EE, here is my code (requires Webbrowser control + winsock control, default names):

Dim page As String
Dim ret As String

Private Sub Form_Load()
Winsock1.Close
Winsock1.Connect "https://www.experts-exchange.com", 80
ret = Chr(13) + Chr(10)
End Sub

Private Sub Form_Resize()
WebBrowser1.Width = Me.Width - 125
WebBrowser1.Height = Me.Height - 525
End Sub

Private Sub Winsock1_Close()
On Error Resume Next
Winsock1.Close
Open "c:\tempurl.html" For Binary As 1
    Put 1, 1, page
Close #1
WebBrowser1.Navigate ("c:\tempurl.html")
End Sub

Private Sub Winsock1_Connect()
info = "GET /Security/Win_Security/Q_20942129.html HTTP/1.1" + ret + _
    "Accept: */*" + ret + _
    "Accept-Language: en-us" + ret + _
    "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" + ret + _
    "Host: https://www.experts-exchange.com" + ret + _
    "Connection: Close" + ret + _
    "Cache-Control: no-cache" + ret + ret
Winsock1.SendData info
End Sub

Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
Dim info As String
Winsock1.GetData info
Debug.Print info
page = page + info
End Sub

Author

Commented:
zzzzzooc, Your solution worked fine!  Thanks to everyone who participated!  I still am wondering why EE sends my those characters, though...
Yohan, what characters? i looked at your code and i don't get any such characters...

Author

Commented:
Just random things that you might not notice.  Always at the start of the data received.  Ex. "EC4" , "FD8" , "SC5" I just re-ran the code above and the random characters pop up above the page editor box, above the list of TAs, and above the first post by CrazyOne.

Commented:
I don't notice anything..



Option Explicit

Private sPage As String
Private Sub Form_Load()
    Winsock1.Close
    Winsock1.Connect "https://www.experts-exchange.com", 80
End Sub
Private Sub Winsock1_Close()
    Dim iPos As Integer
    iPos = InStr(1, sPage, vbCrLf & vbCrLf)
    If iPos > 0 Then
        Open "c:\temp.html" For Output As 1
            Print #1, Mid(sPage, iPos + Len(vbCrLf & vbCrLf))
        Close 1
        WebBrowser1.Navigate "file://c:\temp.html"
    End If
    Winsock1.Close
End Sub
Private Sub Winsock1_Connect()
    Dim sSend As String
    sSend = "GET /Security/Win_Security/Q_20942129.html HTTP/1.1" & vbCrLf
    sSend = sSend & "Host: https://www.experts-exchange.com" & vbCrLf
    sSend = sSend & "Connection: Close" & vbCrLf
    Winsock1.SendData sSend & vbCrLf
End Sub
Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
    Dim sBuff As String
    Winsock1.GetData sBuff
    sPage = sPage & sBuff
End Sub

Author

Commented:
Using that exact code you just posted, this is the file that is generated for me:

http://zealgames.tripod.com/temp.html

I noticed at least two problems, at the very top there's "1C2F" and then, right before the search box, there's "9CC" ...

Commented:
I don't get those results. URLDownloadToFile doesn't return the same characters?

Author

Commented:
nope, URLDownloadToFile works perfectly.  Do you think something is wrong with my winsock control?  Its never done this before.

Commented:
If there was something interfering with winsock, it'd affect both URLDownloadToFile and the Winsock control.

Did you use my method of using Output instead of Binary? I recall some characters being converted incorrectly from Putting.

Author

Commented:
OK, this is very interesting.  I thought I might try the code with my firewall turned off (NPF 2004), since it can sometimes mess up pages with its ad removal and popup blocking features, and lo and behold, no more characters!  Don't ask me why winsock would be any different from URLDownloadToFile, but I guess it is!

I have one last question: do you think it would be faster to use Winsock or URLDownloadToFile?

Author

Commented:
Wow, I just tried the Inet control again without the firewall running and it retrieved the entire page... Strange.  I'll have to take a closer look at NPF's settings.  So now I have three options:

Inet, Winsock, or the API?

What do you think?

Commented:
I'd go with URLDownloadToFile as it automatically retrieves the file and saves it to disk without the hassle of doing it yourself. The Inet control hangs a lot from my experience (while attempting to Cancel or because of Timeout durations or other reasons) and the Winsock control is a lot of overhead since you'll need to have multiple procedures to connect, get data, save to disk and check for errors.

Author

Commented:
I agree!  Thanks for everything!

Commented:
This is the difference between HTTP protocol 1.0 and 1.1
1.1 sends the checksum characters and 1.0 doesn't.

do "GET / HTTP/1.0" not the "GET / HTTP/1.1".

Janis
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.