YohanShminge
asked on
Inet.OpenURL only returns part of the page!!!
This is probably a simple question, but its late and I'm too tired to deal with it. (Plus I've got points to burn)
I'm trying to retrieve a page using the Inet control, however, it does not return the entire page, only a fraction of it. For example:
page = Inet1.OpenURL("http://www.msn.com/")
This returns only:
<html><head><base href="http://g.msn.com/0US!s5.31472_315529/" /> ....other stuff.... <a href="73.a5539/2??cm=LeftN av8">Tec
And that's precisely where it ends. I've tried using Winsock to do this, and I've had the most success with it, but for some reason its tacking random strings of three/four characters onto the beginning of its data chunck. But anyways, I'm rambling...
Thanks for your time, guys!
I'm trying to retrieve a page using the Inet control, however, it does not return the entire page, only a fraction of it. For example:
page = Inet1.OpenURL("http://www.msn.com/")
This returns only:
<html><head><base href="http://g.msn.com/0US!s5.31472_315529/" /> ....other stuff.... <a href="73.a5539/2??cm=LeftN
And that's precisely where it ends. I've tried using Winsock to do this, and I've had the most success with it, but for some reason its tacking random strings of three/four characters onto the beginning of its data chunck. But anyways, I'm rambling...
Thanks for your time, guys!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
hmm.. actually, msn seems to not like connection from winsock ...if you're not trying to do msn, it should work
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
msg = "GET http://www.msn.com HTTP/1.0" + VbCrLf <--- your problem, just make it "GET / HTTP1/.0" & vbcrlf & vbcrlf
If memory serves me right, that part's correct. CrLf seperates each field in the header and an additional CrLf will denote the end of the header. So with that said, the below should be the problem:
>>msg = msg + "Host: www.msn.com" + vbcrlf
>>msg = msg + vbcrlf + vbcrlf
You'll end up having 3 CrLfs (instead of 2) which the server probably won't accept. Also, after reviewing the differences in protocol 1.0 and 1.1, if the Inet control is implementing 1.0 (older version), it may have issues with keeping a persistent connection during requests.
RFC for HTTP/1.1 if you decide to go the Winsock way:
ftp://ftp.isi.edu/in-notes/rfc2616.txt
>>msg = msg + "Host: www.msn.com" + vbcrlf
>>msg = msg + vbcrlf + vbcrlf
You'll end up having 3 CrLfs (instead of 2) which the server probably won't accept. Also, after reviewing the differences in protocol 1.0 and 1.1, if the Inet control is implementing 1.0 (older version), it may have issues with keeping a persistent connection during requests.
RFC for HTTP/1.1 if you decide to go the Winsock way:
ftp://ftp.isi.edu/in-notes/rfc2616.txt
zzzzzzoc, you need 3 at the end.. i've made many, many programs with winsock..
and Brian, doing GET http://....the site, s the same as doing GET / HTTP/1.0
and Brian, doing GET http://....the site, s the same as doing GET / HTTP/1.0
oh.. brian.. i was thinking of using proxies to connect.. my mistake, i am sorry :(
ASKER
Thank you all for your time! I have extensive experience with winsock, but in this particular instance something strange is happening.
What I am actually trying to do is connect up with experts-exchange and retrieve questions, much like QuickPost. What is strange is that when the Winsock1_DataArrival event fires, sometimes the snippets of HTML bring in some sort of little header, like "EC4" or "FD8" or "SC5" - I really dont see any pattern other than there's always three characters, but I don't think there's always this header.
So, instead of using Winsock, this time I decided to go with the Inet control because I thought it would be simpler. However, that does not appear to be the case. Unless someone can explain why I would have this 3 character header, I think I'll go with URLDownloadToFile and see how that performs.
FYI, when using Winsock, you usually have to provide the full request header in order for the server to return a reply, and you definately need the two vbCrlfs after the header. If you'd like to reproduce my scenario with EE, here is my code (requires Webbrowser control + winsock control, default names):
Dim page As String
Dim ret As String
Private Sub Form_Load()
Winsock1.Close
Winsock1.Connect "https://www.experts-exchange.com", 80
ret = Chr(13) + Chr(10)
End Sub
Private Sub Form_Resize()
WebBrowser1.Width = Me.Width - 125
WebBrowser1.Height = Me.Height - 525
End Sub
Private Sub Winsock1_Close()
On Error Resume Next
Winsock1.Close
Open "c:\tempurl.html" For Binary As 1
Put 1, 1, page
Close #1
WebBrowser1.Navigate ("c:\tempurl.html")
End Sub
Private Sub Winsock1_Connect()
info = "GET /Security/Win_Security/Q_2 0942129.ht ml HTTP/1.1" + ret + _
"Accept: */*" + ret + _
"Accept-Language: en-us" + ret + _
"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" + ret + _
"Host: https://www.experts-exchange.com" + ret + _
"Connection: Close" + ret + _
"Cache-Control: no-cache" + ret + ret
Winsock1.SendData info
End Sub
Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
Dim info As String
Winsock1.GetData info
Debug.Print info
page = page + info
End Sub
What I am actually trying to do is connect up with experts-exchange and retrieve questions, much like QuickPost. What is strange is that when the Winsock1_DataArrival event fires, sometimes the snippets of HTML bring in some sort of little header, like "EC4" or "FD8" or "SC5" - I really dont see any pattern other than there's always three characters, but I don't think there's always this header.
So, instead of using Winsock, this time I decided to go with the Inet control because I thought it would be simpler. However, that does not appear to be the case. Unless someone can explain why I would have this 3 character header, I think I'll go with URLDownloadToFile and see how that performs.
FYI, when using Winsock, you usually have to provide the full request header in order for the server to return a reply, and you definately need the two vbCrlfs after the header. If you'd like to reproduce my scenario with EE, here is my code (requires Webbrowser control + winsock control, default names):
Dim page As String
Dim ret As String
Private Sub Form_Load()
Winsock1.Close
Winsock1.Connect "https://www.experts-exchange.com", 80
ret = Chr(13) + Chr(10)
End Sub
Private Sub Form_Resize()
WebBrowser1.Width = Me.Width - 125
WebBrowser1.Height = Me.Height - 525
End Sub
Private Sub Winsock1_Close()
On Error Resume Next
Winsock1.Close
Open "c:\tempurl.html" For Binary As 1
Put 1, 1, page
Close #1
WebBrowser1.Navigate ("c:\tempurl.html")
End Sub
Private Sub Winsock1_Connect()
info = "GET /Security/Win_Security/Q_2
"Accept: */*" + ret + _
"Accept-Language: en-us" + ret + _
"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" + ret + _
"Host: https://www.experts-exchange.com" + ret + _
"Connection: Close" + ret + _
"Cache-Control: no-cache" + ret + ret
Winsock1.SendData info
End Sub
Private Sub Winsock1_DataArrival(ByVal
Dim info As String
Winsock1.GetData info
Debug.Print info
page = page + info
End Sub
ASKER
zzzzzooc, Your solution worked fine! Thanks to everyone who participated! I still am wondering why EE sends my those characters, though...
Yohan, what characters? i looked at your code and i don't get any such characters...
ASKER
Just random things that you might not notice. Always at the start of the data received. Ex. "EC4" , "FD8" , "SC5" I just re-ran the code above and the random characters pop up above the page editor box, above the list of TAs, and above the first post by CrazyOne.
I don't notice anything..
Option Explicit
Private sPage As String
Private Sub Form_Load()
Winsock1.Close
Winsock1.Connect "https://www.experts-exchange.com", 80
End Sub
Private Sub Winsock1_Close()
Dim iPos As Integer
iPos = InStr(1, sPage, vbCrLf & vbCrLf)
If iPos > 0 Then
Open "c:\temp.html" For Output As 1
Print #1, Mid(sPage, iPos + Len(vbCrLf & vbCrLf))
Close 1
WebBrowser1.Navigate "file://c:\temp.html"
End If
Winsock1.Close
End Sub
Private Sub Winsock1_Connect()
Dim sSend As String
sSend = "GET /Security/Win_Security/Q_2 0942129.ht ml HTTP/1.1" & vbCrLf
sSend = sSend & "Host: https://www.experts-exchange.com" & vbCrLf
sSend = sSend & "Connection: Close" & vbCrLf
Winsock1.SendData sSend & vbCrLf
End Sub
Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
Dim sBuff As String
Winsock1.GetData sBuff
sPage = sPage & sBuff
End Sub
Option Explicit
Private sPage As String
Private Sub Form_Load()
Winsock1.Close
Winsock1.Connect "https://www.experts-exchange.com", 80
End Sub
Private Sub Winsock1_Close()
Dim iPos As Integer
iPos = InStr(1, sPage, vbCrLf & vbCrLf)
If iPos > 0 Then
Open "c:\temp.html" For Output As 1
Print #1, Mid(sPage, iPos + Len(vbCrLf & vbCrLf))
Close 1
WebBrowser1.Navigate "file://c:\temp.html"
End If
Winsock1.Close
End Sub
Private Sub Winsock1_Connect()
Dim sSend As String
sSend = "GET /Security/Win_Security/Q_2
sSend = sSend & "Host: https://www.experts-exchange.com" & vbCrLf
sSend = sSend & "Connection: Close" & vbCrLf
Winsock1.SendData sSend & vbCrLf
End Sub
Private Sub Winsock1_DataArrival(ByVal
Dim sBuff As String
Winsock1.GetData sBuff
sPage = sPage & sBuff
End Sub
ASKER
Using that exact code you just posted, this is the file that is generated for me:
http://zealgames.tripod.com/temp.html
I noticed at least two problems, at the very top there's "1C2F" and then, right before the search box, there's "9CC" ...
http://zealgames.tripod.com/temp.html
I noticed at least two problems, at the very top there's "1C2F" and then, right before the search box, there's "9CC" ...
I don't get those results. URLDownloadToFile doesn't return the same characters?
ASKER
nope, URLDownloadToFile works perfectly. Do you think something is wrong with my winsock control? Its never done this before.
If there was something interfering with winsock, it'd affect both URLDownloadToFile and the Winsock control.
Did you use my method of using Output instead of Binary? I recall some characters being converted incorrectly from Putting.
Did you use my method of using Output instead of Binary? I recall some characters being converted incorrectly from Putting.
ASKER
OK, this is very interesting. I thought I might try the code with my firewall turned off (NPF 2004), since it can sometimes mess up pages with its ad removal and popup blocking features, and lo and behold, no more characters! Don't ask me why winsock would be any different from URLDownloadToFile, but I guess it is!
I have one last question: do you think it would be faster to use Winsock or URLDownloadToFile?
I have one last question: do you think it would be faster to use Winsock or URLDownloadToFile?
ASKER
Wow, I just tried the Inet control again without the firewall running and it retrieved the entire page... Strange. I'll have to take a closer look at NPF's settings. So now I have three options:
Inet, Winsock, or the API?
What do you think?
Inet, Winsock, or the API?
What do you think?
I'd go with URLDownloadToFile as it automatically retrieves the file and saves it to disk without the hassle of doing it yourself. The Inet control hangs a lot from my experience (while attempting to Cancel or because of Timeout durations or other reasons) and the Winsock control is a lot of overhead since you'll need to have multiple procedures to connect, get data, save to disk and check for errors.
ASKER
I agree! Thanks for everything!
This is the difference between HTTP protocol 1.0 and 1.1
1.1 sends the checksum characters and 1.0 doesn't.
do "GET / HTTP/1.0" not the "GET / HTTP/1.1".
Janis
1.1 sends the checksum characters and 1.0 doesn't.
do "GET / HTTP/1.0" not the "GET / HTTP/1.1".
Janis
Private Sub Winsock1_DataArrival()
to
Private Sub Winsock1_DataArrival(ByVal