• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 159
  • Last Modified:

Downloading a web page...

Does anyone know of an easy way to download a web page? I was doing: Site.text = Inet1.OpenURL(txtURL, 0) and then saving it to a file, but for some reason some web sites weren't complete.  It would grab like a random number of lines of the html code and then think it was done or something.  It didn't do that all the time, but enough that I need to find another method.
0
dokken
Asked:
dokken
  • 4
  • 3
1 Solution
 
KJHDI12Commented:
Try this:

Private Sub Command1_Click()

   ' ---------------------------------------------
   ' Tells Inet to connect to site and get file
   ' ---------------------------------------------
   Inet.URL = "http://www.sol.no/"
   Inet.Protocol = icHTTP
   Inet.RemoteHost = "www.sol.no"
   Inet.Execute "http://www.sol.no", "GET index.html"

End Sub

Private Sub Inet_StateChanged(ByVal State As Integer)

   ' ----------------------------------------------------
   ' State 12 happends when Inet has downloaded the page
   ' and is in the buffer
   ' ------------------------------------------------------
   If State = 12 Then
      Open "c:\index.html" For Output As #1
      Print #1, Inet.GetChunk(64000)  ' <- Size of page
      Close #1
   End If
   
End Sub


Mr. Fixit
0
 
dokkenAuthor Commented:
Looks good except, I should have mentioned I don't always know what the html filename is called.  Sometimes it's just (using your example): http://www.sol.no/  is there anyway to grab the filename the web server uses as default in that case?
0
 
dokkenAuthor Commented:
I just tried that one of the servers that I noticed was not saving the whole file, it's screws up just like the way I was doing it.  I don't think that Internet Control is going to work.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
KJHDI12Commented:
Did you increase the Inet.GetChunk(64000) value? 64000 means 64k. Increase it to whatever maxsize you want.

When no filename is asked for it assumes it's index.html or index.htm.

Mr. Fixit
0
 
dokkenAuthor Commented:
Size didn't matter, it would only get 1k... it cuts off near the top of the web page.  Maybe the Internet Control doesn't like some web server software.  If you want to play around with it, the site I found that duplicates the problem is: http://www.searchenginewatch.com
0
 
KJHDI12Commented:

  hehe.. The solution was too easy..

Put multiple  "Print #1, Inet.GetChunk(64000)" after eachother.

One "Print #1, Inet.GetChunk(64000)" gets 2k. 2 ""Print #1, Inet.GetChunk(64000)" gets 4k etc...

Mr. Fixit


0
 
dokkenAuthor Commented:
Thats strange :) but it works... thanks!
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now