• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 198
  • Last Modified:

Make sure document is downloaded before parsing out elements

I have a problem that took me awhile to track down where my program begins trying to use regex on a document to extract text before the document is completely downloaded. It kept getting only half of the text correctly and dropping the other half until I ran the function again. I used LEN on the document html to figure out that the two files were dramatically different in size.

 Dim EventUrl As String = "http://www.whatismyip.com"

                    Dim sResult As String
                    Dim oHttp As HttpWebRequest
                    Dim objResponse As WebResponse
                    Dim objRequest As WebRequest = System.Net.HttpWebRequest.Create(EventUrl)

                    Dim ProxyAddress As String = "XXX.XXX.XX.X"
                    Dim ProxyPort As Integer = "8088"



                    Dim oProxy As New WebProxy(ProxyAddress, ProxyPort)
                    objRequest.Proxy = oProxy

                    objRequest.Method = "GET"
                    objRequest.Timeout = 1200000 ' 20 sec.
                    objResponse = objRequest.GetResponse
                   
                    Dim sr As System.IO.StreamReader = New System.IO.StreamReader(objResponse.GetResponseStream(), System.Text.Encoding.UTF7)

                    sResult = sr.ReadToEnd()

                    sr.Close()


                    Dim sErr As String
                    Dim FullTempFilePath As String = "C:\TJYTEMP\MAINSEL" & Rnd(500) & Rnd(350) & "TEMP.html"
                    SaveTextToFile(sResult, FullTempFilePath, sErr)

                    Debug.Write("FILE WRITE ERROR: " & sErr.ToString)

                    Dim doc As New HtmlDocument(FullTempFilePath)



                    EStatusLabel.Text = "Digesting"



Heres what the above code does, I had to make an http request through a proxy so I make the request with the proxy then saved the html to a file. This is where I think my problem lies as it must not be waiting for all of the html document to come in before saving. Then I used "Dim doc As New HtmlDocument(FullTempFilePath)" to create a doc from that file I just saved. I need to do this so I can use the mshtml to parse out the elements in the document. I know this must be an easy fix.  500 points because I need to have the skeleton of this program developed in 2 weeks! :D
0
JPERKS1985
Asked:
JPERKS1985
  • 4
  • 3
1 Solution
 
Bob LearnedCommented:
I don't believe that you need to save it to a file to create an HtmlDocument, but I am not 100% sure about that.

If it is true, then you could do something like this:

While Not System.IO.File.Exists(FullTempFilePath)
   Application.DoEvents()
End While

Bob
0
 
JPERKS1985Author Commented:
Hmm, I don't think it'll work because the file will exist only not in its full form. Anyway to make it wait till the file is fully written?
0
 
Bob LearnedCommented:
The file will not exist until it is fully written--try it.

Bob
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
JPERKS1985Author Commented:
didin't work :(. Its as if my code keeps writing over the file it created until its full? I put msgbox (Len(Dochtml)) in the part that parses it and when it finds the remaining elements the len jumps up.
0
 
JPERKS1985Author Commented:
Anything that can be done with objResponse.ContentLength? Even though I'm not sure what the full length will be?
0
 
Bob LearnedCommented:
Test:

   Get the response stream into a variable, and check the length of the stream, and see if you are getting the entire response.

Bob
0
 
JPERKS1985Author Commented:
  While m_document Is Nothing OrElse m_document.readyState <> "complete"
                    Application.DoEvents()
                End While
Is what I used it seems to work good.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now