Solved

Inet.OpenURL only returns part of the page!!!

Posted on 2004-04-02
23
4,010 Views
Last Modified: 2013-11-13
This is probably a simple question, but its late and I'm too tired to deal with it.  (Plus I've got points to burn)

I'm trying to retrieve a page using the Inet control, however, it does not return the entire page, only a fraction of it.  For example:

page = Inet1.OpenURL("http://www.msn.com/")

This returns only:

<html><head><base href="http://g.msn.com/0US!s5.31472_315529/" /> ....other stuff.... <a href="73.a5539/2??cm=LeftNav8">Tec

And that's precisely where it ends.  I've tried using Winsock to do this, and I've had the most success with it, but for some reason its tacking random strings of three/four characters onto the beginning of its data chunck.  But anyways, I'm rambling...

Thanks for your time, guys!

0
Comment
Question by:YohanShminge
  • 8
  • 6
  • 6
  • +2
23 Comments
 
LVL 17

Accepted Solution

by:
zzzzzooc earned 200 total points
ID: 10746234
Alternative.. and if the below doesn't work, you have connectivity issues.

Form1:
======================
Option Explicit

Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Private Sub Form_Load()
    MsgBox GetFile("http://www.msn.com/")
End Sub
Private Function GetFile(ByVal sURL As String) As String
    Dim sTempFile As String, lRet As Long, iFF As Integer
    sTempFile = "c:\" & Timer
    lRet = URLDownloadToFile(0, "http://www.msn.com/", sTempFile, 0, 0)
    If lRet = 0 Then
        If Dir(sTempFile, vbNormal) <> vbNullString Then
            iFF = FreeFile
            Open sTempFile For Binary As iFF
                GetFile = Space(LOF(iFF))
                Get #iFF, 1, GetFile
            Close iFF
            Call Kill(sTempFile)
        End If
    End If
End Function



0
 
LVL 4

Assisted Solution

by:learning_t0_pr0gram
learning_t0_pr0gram earned 50 total points
ID: 10746298
if all you want is the source.. i've got a simple solution.. use Winsock, not Inet.. here's how you would get the source of msn.com through winsock.. Add a Command button and a Multi Line Text Box:

Dim msg As String ' data to be sent through winsock

Private Sub Command1_Click()
Winsock1.Close
Winsock1.Connect "www.msn.com", 80
End Sub

Private Sub Winsock1_Connect()
msg = "GET http://www.msn.com HTTP/1.0" + VbCrLf
msg = msg + "Accept: */*" + vbcrlf
msg = msg + "Accept: text/html" + vbcrlf
msg = msg + "Host: www.msn.com" + vbcrlf
msg = msg + vbcrlf + vbcrlf
Winsock1.SendData msg 'send the html to msn
End Sub

Private Sub Winsock1_DataArrival()
Dim incoming as String
Winsock1.GetData Incoming 'store the data msn sends back in "incoming"
Text1.Text = Text1.Text & incoming
If Instr(1, lcase(text1.text), "</html>") Then  ' this is needed because winsock sends the code in parts, so closing it early will cut off the code
Winsock1.Close
MsgBox "Source Recieved!"
End Sub
0
 
LVL 4

Expert Comment

by:learning_t0_pr0gram
ID: 10746302
sorry, change:

Private Sub Winsock1_DataArrival()

to

Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
0
 
LVL 4

Expert Comment

by:learning_t0_pr0gram
ID: 10746468
hmm.. actually, msn seems to not like connection from winsock ...if you're not trying to do msn, it should work
0
 
LVL 19

Assisted Solution

by:BrianGEFF719
BrianGEFF719 earned 50 total points
ID: 10747542
MSN Shouldnt know the difference from Winsock or INET. Winsock and INET can do the exact same thing, infact there is no way for it to tell the difference. If you are having a problem with Winsock, its most likely because you are not seting up the GET Packet correctly.

try just like this

winsock1.senddata "GET /file.exe HTTP/1.0" & vbcrlf & vbcrlf



that will work just fine.
0
 
LVL 19

Expert Comment

by:BrianGEFF719
ID: 10747544
msg = "GET http://www.msn.com HTTP/1.0" + VbCrLf <--- your problem, just make it "GET / HTTP1/.0" & vbcrlf & vbcrlf
0
 
LVL 17

Expert Comment

by:zzzzzooc
ID: 10747814
If memory serves me right, that part's correct. CrLf seperates each field in the header and an additional CrLf will denote the end of the header. So with that said, the below should be the problem:

>>msg = msg + "Host: www.msn.com" + vbcrlf
>>msg = msg + vbcrlf + vbcrlf

You'll end up having 3 CrLfs (instead of 2) which the server probably won't accept. Also, after reviewing the differences in protocol 1.0 and 1.1, if the Inet control is implementing 1.0 (older version), it may have issues with keeping a persistent connection during requests.

RFC for HTTP/1.1 if you decide to go the Winsock way:
ftp://ftp.isi.edu/in-notes/rfc2616.txt
0
 
LVL 4

Expert Comment

by:learning_t0_pr0gram
ID: 10748958
zzzzzzoc, you need 3 at the end.. i've made many, many programs with winsock..
and Brian, doing GET http://....the site, s the same as doing GET / HTTP/1.0
0
 
LVL 4

Expert Comment

by:learning_t0_pr0gram
ID: 10748974
oh.. brian.. i was thinking of using proxies to connect.. my mistake, i am sorry  :(
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10749187
Thank you all for your time!  I have extensive experience with winsock, but in this particular instance something strange is happening.

What I am actually trying to do is connect up with experts-exchange and retrieve questions, much like QuickPost.  What is strange is that when the Winsock1_DataArrival event fires, sometimes the snippets of HTML bring in some sort of little header, like "EC4" or "FD8" or "SC5" - I really dont see any pattern other than there's always three characters, but I don't think there's always this header.

So, instead of using Winsock, this time I decided to go with the Inet control because I thought it would be simpler.  However, that does not appear to be the case.  Unless someone can explain why I would have this 3 character header, I think I'll go with URLDownloadToFile and see how that performs.

FYI, when using Winsock, you usually have to provide the full request header in order for the server to return a reply, and you definately need the two vbCrlfs after the header.  If you'd like to reproduce my scenario with EE, here is my code (requires Webbrowser control + winsock control, default names):

Dim page As String
Dim ret As String

Private Sub Form_Load()
Winsock1.Close
Winsock1.Connect "www.experts-exchange.com", 80
ret = Chr(13) + Chr(10)
End Sub

Private Sub Form_Resize()
WebBrowser1.Width = Me.Width - 125
WebBrowser1.Height = Me.Height - 525
End Sub

Private Sub Winsock1_Close()
On Error Resume Next
Winsock1.Close
Open "c:\tempurl.html" For Binary As 1
    Put 1, 1, page
Close #1
WebBrowser1.Navigate ("c:\tempurl.html")
End Sub

Private Sub Winsock1_Connect()
info = "GET /Security/Win_Security/Q_20942129.html HTTP/1.1" + ret + _
    "Accept: */*" + ret + _
    "Accept-Language: en-us" + ret + _
    "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" + ret + _
    "Host: www.experts-exchange.com" + ret + _
    "Connection: Close" + ret + _
    "Cache-Control: no-cache" + ret + ret
Winsock1.SendData info
End Sub

Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
Dim info As String
Winsock1.GetData info
Debug.Print info
page = page + info
End Sub
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10752054
zzzzzooc, Your solution worked fine!  Thanks to everyone who participated!  I still am wondering why EE sends my those characters, though...
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 4

Expert Comment

by:learning_t0_pr0gram
ID: 10752605
Yohan, what characters? i looked at your code and i don't get any such characters...
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10752628
Just random things that you might not notice.  Always at the start of the data received.  Ex. "EC4" , "FD8" , "SC5" I just re-ran the code above and the random characters pop up above the page editor box, above the list of TAs, and above the first post by CrazyOne.
0
 
LVL 17

Expert Comment

by:zzzzzooc
ID: 10752753
I don't notice anything..



Option Explicit

Private sPage As String
Private Sub Form_Load()
    Winsock1.Close
    Winsock1.Connect "www.experts-exchange.com", 80
End Sub
Private Sub Winsock1_Close()
    Dim iPos As Integer
    iPos = InStr(1, sPage, vbCrLf & vbCrLf)
    If iPos > 0 Then
        Open "c:\temp.html" For Output As 1
            Print #1, Mid(sPage, iPos + Len(vbCrLf & vbCrLf))
        Close 1
        WebBrowser1.Navigate "file://c:\temp.html"
    End If
    Winsock1.Close
End Sub
Private Sub Winsock1_Connect()
    Dim sSend As String
    sSend = "GET /Security/Win_Security/Q_20942129.html HTTP/1.1" & vbCrLf
    sSend = sSend & "Host: www.experts-exchange.com" & vbCrLf
    sSend = sSend & "Connection: Close" & vbCrLf
    Winsock1.SendData sSend & vbCrLf
End Sub
Private Sub Winsock1_DataArrival(ByVal bytesTotal As Long)
    Dim sBuff As String
    Winsock1.GetData sBuff
    sPage = sPage & sBuff
End Sub
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10752793
Using that exact code you just posted, this is the file that is generated for me:

http://zealgames.tripod.com/temp.html

I noticed at least two problems, at the very top there's "1C2F" and then, right before the search box, there's "9CC" ...
0
 
LVL 17

Expert Comment

by:zzzzzooc
ID: 10752884
I don't get those results. URLDownloadToFile doesn't return the same characters?
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10752926
nope, URLDownloadToFile works perfectly.  Do you think something is wrong with my winsock control?  Its never done this before.
0
 
LVL 17

Expert Comment

by:zzzzzooc
ID: 10753009
If there was something interfering with winsock, it'd affect both URLDownloadToFile and the Winsock control.

Did you use my method of using Output instead of Binary? I recall some characters being converted incorrectly from Putting.
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10753055
OK, this is very interesting.  I thought I might try the code with my firewall turned off (NPF 2004), since it can sometimes mess up pages with its ad removal and popup blocking features, and lo and behold, no more characters!  Don't ask me why winsock would be any different from URLDownloadToFile, but I guess it is!

I have one last question: do you think it would be faster to use Winsock or URLDownloadToFile?
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10753069
Wow, I just tried the Inet control again without the firewall running and it retrieved the entire page... Strange.  I'll have to take a closer look at NPF's settings.  So now I have three options:

Inet, Winsock, or the API?

What do you think?
0
 
LVL 17

Expert Comment

by:zzzzzooc
ID: 10753119
I'd go with URLDownloadToFile as it automatically retrieves the file and saves it to disk without the hassle of doing it yourself. The Inet control hangs a lot from my experience (while attempting to Cancel or because of Timeout durations or other reasons) and the Winsock control is a lot of overhead since you'll need to have multiple procedures to connect, get data, save to disk and check for errors.
0
 
LVL 11

Author Comment

by:YohanShminge
ID: 10753131
I agree!  Thanks for everything!
0
 

Expert Comment

by:j0hny_
ID: 11586972
This is the difference between HTTP protocol 1.0 and 1.1
1.1 sends the checksum characters and 1.0 doesn't.

do "GET / HTTP/1.0" not the "GET / HTTP/1.1".

Janis
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now