Solved

Reading HTML using System.IO.StreamReader

Posted on 2008-10-18
3
650 Views
Last Modified: 2008-10-18
Hi Experts,

I am reading into my application many pages of HTML so that I can retrieve (scrape) data from them. My problem is that the data I require lies a thousand lines of HTML into the page. Having to read through these unwanted lines of code each time a data scrape is made is slowing things down. Is it possible to make an HTML page request of the server starting at line 1000 for example?
0
Comment
Question by:DColin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 2

Expert Comment

by:wmestrom
ID: 22747873
You can try if the server supports the range option in the HTTP header. Then you can specify a byte offset. The HTTP request would look something like this:

GET /somepage HTTP/1.1
Host: www.xyz.org
Range: bytes=123456-
Accept: *.*, */*

Hope this will work for you.

Greets
Willem
0
 

Author Comment

by:DColin
ID: 22748077
Hi wmestrom:

Do you know how I can use your answer with my existing code? Thanks.
        Dim MyRequest As System.Net.HttpWebRequest
        Dim MyResponse As System.Net.HttpWebResponse
        Dim MyStream As System.IO.StreamReader
 
        MyRequest = System.Net.WebRequest.Create("http://www.abc.com")
        MyResponse = MyRequest.GetResponse()
        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0
 
LVL 2

Accepted Solution

by:
wmestrom earned 500 total points
ID: 22748477
This should work. However many servers ignore the range part... The page I used should work though.
        Dim MyRequest As System.Net.HttpWebRequest
        Dim MyResponse As System.Net.WebResponse
        Dim MyStream As System.IO.StreamReader
 
        MyRequest = DirectCast(System.Net.HttpWebRequest.Create("http://www.gnu.org/projects/dotgnu/pnetlib-doc/System/Net/HttpWebRequest.html"), System.Net.HttpWebRequest)
        MyRequest.AddRange(10000, Integer.MaxValue)
        MyResponse = MyRequest.GetResponse()
        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0

Featured Post

SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

1.0 - Introduction Converting Visual Basic 6.0 (VB6) to Visual Basic 2008+ (VB.NET). If ever there was a subject full of murkiness and bad decisions, it is this one!   The first problem seems to be that people considering this task of converting…
Microsoft Reports are based on a report definition, which is an XML file that describes data and layout for the report, with a different extension. You can create a client-side report definition language (*.rdlc) file with Visual Studio, and build g…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question