Inet.OpenURL only returns part of the page!!!

This is probably a simple question, but its late and I'm too tired to deal with it.  (Plus I've got points to burn)

I'm trying to retrieve a page using the Inet control, however, it does not return the entire page, only a fraction of it.  For example:

page = Inet1.OpenURL("http://www.msn.com/")

This returns only:

<html><head><base href="http://g.msn.com/0US!s5.31472_315529/" /> ....other stuff.... <a href="73.a5539/2??cm=LeftNav8">Tec

And that's precisely where it ends.  I've tried using Winsock to do this, and I've had the most success with it, but for some reason its tacking random strings of three/four characters onto the beginning of its data chunck.  But anyways, I'm rambling...

Thanks for your time, guys!

LVL 11
YohanShmingeAsked:
Who is Participating?
 
zzzzzoocCommented:
Alternative.. and if the below doesn't work, you have connectivity issues.

Form1:
======================
Option Explicit

Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Private Sub Form_Load()
    MsgBox GetFile("http://www.msn.com/")
End Sub
Private Function GetFile(ByVal sURL As String) As String
    Dim sTempFile As String, lRet As Long, iFF As Integer
    sTempFile = "c:\" & Timer
    lRet = URLDownloadToFile(0, "http://www.msn.com/", sTempFile, 0, 0)
    If lRet = 0 Then
        If Dir(sTempFile, vbNormal) <> vbNullString Then
            iFF = FreeFile
            Open sTempFile For Binary As iFF
                GetFile = Space(LOF(iFF))
                Get #iFF, 1, GetFile
            Close iFF
            Call Kill(sTempFile)
        End If
    End If
End Function



0
 
learning_t0_pr0gramCommented:
if all you want is the source.. i've got a simple solution.. use Winsock, not Inet.. here's how you would get the source of msn.com through winsock.. Add a Command button and a Multi Line Text Box:

Dim msg As String ' data to be sent through winsock

Private Sub Command1_Click()
Winsock1.Close
Winsock1.Connect "www.msn.com", 80
End Sub

Private Sub Winsock1_Connect()
msg = "GET http://www.msn.com HTTP/1.0" + VbCrLf
msg = msg + "Accept: */*" + vbcrlf
msg = msg + "Accept: text/html" + vbcrlf
msg = msg + "Host: www.msn.com" + vbcrlf
msg = msg + vbcrlf + vbcrlf
Winsock1.SendData msg 'send the html to msn
End Sub

Private Sub Winsock1_DataArrival()
Dim incoming as String
Winsock1.GetData Incoming 'store the data msn sends back in "incoming"
Text1.Text = Text1.Text & incoming
If Instr(1, lcase(text1.text), "</html>") Then  ' this is needed because winsock sends the code in parts, so closing it early will cut off the code
Winsock1.Close
MsgBox "Source Recieved!"
End Sub
0
 
learning_t0_pr0gramCommented:
sorry, change:

Private Sub Winsock1_DataArrival()

to

Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

 
learning_t0_pr0gramCommented:
hmm.. actually, msn seems to not like connection from winsock ...if you're not trying to do msn, it should work
0
 
BrianGEFF719Commented:
MSN Shouldnt know the difference from Winsock or INET. Winsock and INET can do the exact same thing, infact there is no way for it to tell the difference. If you are having a problem with Winsock, its most likely because you are not seting up the GET Packet correctly.

try just like this

winsock1.senddata "GET /file.exe HTTP/1.0" & vbcrlf & vbcrlf



that will work just fine.
0
 
BrianGEFF719Commented:
msg = "GET http://www.msn.com HTTP/1.0" + VbCrLf <--- your problem, just make it "GET / HTTP1/.0" & vbcrlf & vbcrlf
0
 
zzzzzoocCommented:
If memory serves me right, that part's correct. CrLf seperates each field in the header and an additional CrLf will denote the end of the header. So with that said, the below should be the problem:

>>msg = msg + "Host: www.msn.com" + vbcrlf
>>msg = msg + vbcrlf + vbcrlf

You'll end up having 3 CrLfs (instead of 2) which the server probably won't accept. Also, after reviewing the differences in protocol 1.0 and 1.1, if the Inet control is implementing 1.0 (older version), it may have issues with keeping a persistent connection during requests.

RFC for HTTP/1.1 if you decide to go the Winsock way:
ftp://ftp.isi.edu/in-notes/rfc2616.txt
0
 
learning_t0_pr0gramCommented:
zzzzzzoc, you need 3 at the end.. i've made many, many programs with winsock..
and Brian, doing GET http://....the site, s the same as doing GET / HTTP/1.0
0
 
learning_t0_pr0gramCommented:
oh.. brian.. i was thinking of using proxies to connect.. my mistake, i am sorry  :(
0
 
YohanShmingeAuthor Commented:
Thank you all for your time!  I have extensive experience with winsock, but in this particular instance something strange is happening.

What I am actually trying to do is connect up with experts-exchange and retrieve questions, much like QuickPost.  What is strange is that when the Winsock1_DataArrival event fires, sometimes the snippets of HTML bring in some sort of little header, like "EC4" or "FD8" or "SC5" - I really dont see any pattern other than there's always three characters, but I don't think there's always this header.

So, instead of using Winsock, this time I decided to go with the Inet control because I thought it would be simpler.  However, that does not appear to be the case.  Unless someone can explain why I would have this 3 character header, I think I'll go with URLDownloadToFile and see how that performs.

FYI, when using Winsock, you usually have to provide the full request header in order for the server to return a reply, and you definately need the two vbCrlfs after the header.  If you'd like to reproduce my scenario with EE, here is my code (requires Webbrowser control + winsock control, default names):

Dim page As String
Dim ret As String

Private Sub Form_Load()
Winsock1.Close
Winsock1.Connect "www.experts-exchange.com", 80
ret = Chr(13) + Chr(10)
End Sub

Private Sub Form_Resize()
WebBrowser1.Width = Me.Width - 125
WebBrowser1.Height = Me.Height - 525
End Sub

Private Sub Winsock1_Close()
On Error Resume Next
Winsock1.Close
Open "c:\tempurl.html" For Binary As 1
    Put 1, 1, page
Close #1
WebBrowser1.Navigate ("c:\tempurl.html")
End Sub

Private Sub Winsock1_Connect()
info = "GET /Security/Win_Security/Q_20942129.html HTTP/1.1" + ret + _
    "Accept: */*" + ret + _
    "Accept-Language: en-us" + ret + _
    "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" + ret + _
    "Host: www.experts-exchange.com" + ret + _
    "Connection: Close" + ret + _
    "Cache-Control: no-cache" + ret + ret
Winsock1.SendData info
End Sub

Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
Dim info As String
Winsock1.GetData info
Debug.Print info
page = page + info
End Sub
0
 
YohanShmingeAuthor Commented:
zzzzzooc, Your solution worked fine!  Thanks to everyone who participated!  I still am wondering why EE sends my those characters, though...
0
 
learning_t0_pr0gramCommented:
Yohan, what characters? i looked at your code and i don't get any such characters...
0
 
YohanShmingeAuthor Commented:
Just random things that you might not notice.  Always at the start of the data received.  Ex. "EC4" , "FD8" , "SC5" I just re-ran the code above and the random characters pop up above the page editor box, above the list of TAs, and above the first post by CrazyOne.
0
 
zzzzzoocCommented:
I don't notice anything..



Option Explicit

Private sPage As String
Private Sub Form_Load()
    Winsock1.Close
    Winsock1.Connect "www.experts-exchange.com", 80
End Sub
Private Sub Winsock1_Close()
    Dim iPos As Integer
    iPos = InStr(1, sPage, vbCrLf & vbCrLf)
    If iPos > 0 Then
        Open "c:\temp.html" For Output As 1
            Print #1, Mid(sPage, iPos + Len(vbCrLf & vbCrLf))
        Close 1
        WebBrowser1.Navigate "file://c:\temp.html"
    End If
    Winsock1.Close
End Sub
Private Sub Winsock1_Connect()
    Dim sSend As String
    sSend = "GET /Security/Win_Security/Q_20942129.html HTTP/1.1" & vbCrLf
    sSend = sSend & "Host: www.experts-exchange.com" & vbCrLf
    sSend = sSend & "Connection: Close" & vbCrLf
    Winsock1.SendData sSend & vbCrLf
End Sub
Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
    Dim sBuff As String
    Winsock1.GetData sBuff
    sPage = sPage & sBuff
End Sub
0
 
YohanShmingeAuthor Commented:
Using that exact code you just posted, this is the file that is generated for me:

http://zealgames.tripod.com/temp.html

I noticed at least two problems, at the very top there's "1C2F" and then, right before the search box, there's "9CC" ...
0
 
zzzzzoocCommented:
I don't get those results. URLDownloadToFile doesn't return the same characters?
0
 
YohanShmingeAuthor Commented:
nope, URLDownloadToFile works perfectly.  Do you think something is wrong with my winsock control?  Its never done this before.
0
 
zzzzzoocCommented:
If there was something interfering with winsock, it'd affect both URLDownloadToFile and the Winsock control.

Did you use my method of using Output instead of Binary? I recall some characters being converted incorrectly from Putting.
0
 
YohanShmingeAuthor Commented:
OK, this is very interesting.  I thought I might try the code with my firewall turned off (NPF 2004), since it can sometimes mess up pages with its ad removal and popup blocking features, and lo and behold, no more characters!  Don't ask me why winsock would be any different from URLDownloadToFile, but I guess it is!

I have one last question: do you think it would be faster to use Winsock or URLDownloadToFile?
0
 
YohanShmingeAuthor Commented:
Wow, I just tried the Inet control again without the firewall running and it retrieved the entire page... Strange.  I'll have to take a closer look at NPF's settings.  So now I have three options:

Inet, Winsock, or the API?

What do you think?
0
 
zzzzzoocCommented:
I'd go with URLDownloadToFile as it automatically retrieves the file and saves it to disk without the hassle of doing it yourself. The Inet control hangs a lot from my experience (while attempting to Cancel or because of Timeout durations or other reasons) and the Winsock control is a lot of overhead since you'll need to have multiple procedures to connect, get data, save to disk and check for errors.
0
 
YohanShmingeAuthor Commented:
I agree!  Thanks for everything!
0
 
j0hny_Commented:
This is the difference between HTTP protocol 1.0 and 1.1
1.1 sends the checksum characters and 1.0 doesn't.

do "GET / HTTP/1.0" not the "GET / HTTP/1.1".

Janis
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.