Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 158
  • Last Modified:

Downloading a web page...

Does anyone know of an easy way to download a web page? I was doing: Site.text = Inet1.OpenURL(txtURL, 0) and then saving it to a file, but for some reason some web sites weren't complete.  It would grab like a random number of lines of the html code and then think it was done or something.  It didn't do that all the time, but enough that I need to find another method.
0
dokken
Asked:
dokken
  • 4
  • 3
1 Solution
 
KJHDI12Commented:
Try this:

Private Sub Command1_Click()

   ' ---------------------------------------------
   ' Tells Inet to connect to site and get file
   ' ---------------------------------------------
   Inet.URL = "http://www.sol.no/"
   Inet.Protocol = icHTTP
   Inet.RemoteHost = "www.sol.no"
   Inet.Execute "http://www.sol.no", "GET index.html"

End Sub

Private Sub Inet_StateChanged(ByVal State As Integer)

   ' ----------------------------------------------------
   ' State 12 happends when Inet has downloaded the page
   ' and is in the buffer
   ' ------------------------------------------------------
   If State = 12 Then
      Open "c:\index.html" For Output As #1
      Print #1, Inet.GetChunk(64000)  ' <- Size of page
      Close #1
   End If
   
End Sub


Mr. Fixit
0
 
dokkenAuthor Commented:
Looks good except, I should have mentioned I don't always know what the html filename is called.  Sometimes it's just (using your example): http://www.sol.no/  is there anyway to grab the filename the web server uses as default in that case?
0
 
dokkenAuthor Commented:
I just tried that one of the servers that I noticed was not saving the whole file, it's screws up just like the way I was doing it.  I don't think that Internet Control is going to work.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
KJHDI12Commented:
Did you increase the Inet.GetChunk(64000) value? 64000 means 64k. Increase it to whatever maxsize you want.

When no filename is asked for it assumes it's index.html or index.htm.

Mr. Fixit
0
 
dokkenAuthor Commented:
Size didn't matter, it would only get 1k... it cuts off near the top of the web page.  Maybe the Internet Control doesn't like some web server software.  If you want to play around with it, the site I found that duplicates the problem is: http://www.searchenginewatch.com
0
 
KJHDI12Commented:

  hehe.. The solution was too easy..

Put multiple  "Print #1, Inet.GetChunk(64000)" after eachother.

One "Print #1, Inet.GetChunk(64000)" gets 2k. 2 ""Print #1, Inet.GetChunk(64000)" gets 4k etc...

Mr. Fixit


0
 
dokkenAuthor Commented:
Thats strange :) but it works... thanks!
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now