Solved

Reading HTML using System.IO.StreamReader

Posted on 2008-10-18
3
649 Views
Last Modified: 2008-10-18
Hi Experts,

I am reading into my application many pages of HTML so that I can retrieve (scrape) data from them. My problem is that the data I require lies a thousand lines of HTML into the page. Having to read through these unwanted lines of code each time a data scrape is made is slowing things down. Is it possible to make an HTML page request of the server starting at line 1000 for example?
0
Comment
Question by:DColin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 2

Expert Comment

by:wmestrom
ID: 22747873
You can try if the server supports the range option in the HTTP header. Then you can specify a byte offset. The HTTP request would look something like this:

GET /somepage HTTP/1.1
Host: www.xyz.org
Range: bytes=123456-
Accept: *.*, */*

Hope this will work for you.

Greets
Willem
0
 

Author Comment

by:DColin
ID: 22748077
Hi wmestrom:

Do you know how I can use your answer with my existing code? Thanks.
        Dim MyRequest As System.Net.HttpWebRequest
        Dim MyResponse As System.Net.HttpWebResponse
        Dim MyStream As System.IO.StreamReader
 
        MyRequest = System.Net.WebRequest.Create("http://www.abc.com")
        MyResponse = MyRequest.GetResponse()
        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0
 
LVL 2

Accepted Solution

by:
wmestrom earned 500 total points
ID: 22748477
This should work. However many servers ignore the range part... The page I used should work though.
        Dim MyRequest As System.Net.HttpWebRequest
        Dim MyResponse As System.Net.WebResponse
        Dim MyStream As System.IO.StreamReader
 
        MyRequest = DirectCast(System.Net.HttpWebRequest.Create("http://www.gnu.org/projects/dotgnu/pnetlib-doc/System/Net/HttpWebRequest.html"), System.Net.HttpWebRequest)
        MyRequest.AddRange(10000, Integer.MaxValue)
        MyResponse = MyRequest.GetResponse()
        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Since .Net 2.0, Visual Basic has made it easy to create a splash screen and set it via the "Splash Screen" drop down in the Project Properties.  A splash screen set in this manner is automatically created, displayed and closed by the framework itsel…
The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question