Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Reading HTML using System.IO.StreamReader

Posted on 2008-10-18
3
648 Views
Last Modified: 2008-10-18
Hi Experts,

I am reading into my application many pages of HTML so that I can retrieve (scrape) data from them. My problem is that the data I require lies a thousand lines of HTML into the page. Having to read through these unwanted lines of code each time a data scrape is made is slowing things down. Is it possible to make an HTML page request of the server starting at line 1000 for example?
0
Comment
Question by:DColin
  • 2
3 Comments
 
LVL 2

Expert Comment

by:wmestrom
ID: 22747873
You can try if the server supports the range option in the HTTP header. Then you can specify a byte offset. The HTTP request would look something like this:

GET /somepage HTTP/1.1
Host: www.xyz.org
Range: bytes=123456-
Accept: *.*, */*

Hope this will work for you.

Greets
Willem
0
 

Author Comment

by:DColin
ID: 22748077
Hi wmestrom:

Do you know how I can use your answer with my existing code? Thanks.
        Dim MyRequest As System.Net.HttpWebRequest
        Dim MyResponse As System.Net.HttpWebResponse
        Dim MyStream As System.IO.StreamReader
 
        MyRequest = System.Net.WebRequest.Create("http://www.abc.com")
        MyResponse = MyRequest.GetResponse()
        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0
 
LVL 2

Accepted Solution

by:
wmestrom earned 500 total points
ID: 22748477
This should work. However many servers ignore the range part... The page I used should work though.
        Dim MyRequest As System.Net.HttpWebRequest
        Dim MyResponse As System.Net.WebResponse
        Dim MyStream As System.IO.StreamReader
 
        MyRequest = DirectCast(System.Net.HttpWebRequest.Create("http://www.gnu.org/projects/dotgnu/pnetlib-doc/System/Net/HttpWebRequest.html"), System.Net.HttpWebRequest)
        MyRequest.AddRange(10000, Integer.MaxValue)
        MyResponse = MyRequest.GetResponse()
        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

A while ago, I was working on a Windows Forms application and I needed a special label control with reflection (glass) effect to show some titles in a stylish way. I've always enjoyed working with graphics, but it's never too clever to re-invent …
Parsing a CSV file is a task that we are confronted with regularly, and although there are a vast number of means to do this, as a newbie, the field can be confusing and the tools can seem complex. A simple solution to parsing a customized CSV fi…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question