Solved

Reading HTML using System.IO.StreamReader

Posted on 2008-10-18
3
642 Views
Last Modified: 2008-10-18
Hi Experts,

I am reading into my application many pages of HTML so that I can retrieve (scrape) data from them. My problem is that the data I require lies a thousand lines of HTML into the page. Having to read through these unwanted lines of code each time a data scrape is made is slowing things down. Is it possible to make an HTML page request of the server starting at line 1000 for example?
0
Comment
Question by:DColin
  • 2
3 Comments
 
LVL 2

Expert Comment

by:wmestrom
ID: 22747873
You can try if the server supports the range option in the HTTP header. Then you can specify a byte offset. The HTTP request would look something like this:

GET /somepage HTTP/1.1
Host: www.xyz.org
Range: bytes=123456-
Accept: *.*, */*

Hope this will work for you.

Greets
Willem
0
 

Author Comment

by:DColin
ID: 22748077
Hi wmestrom:

Do you know how I can use your answer with my existing code? Thanks.
        Dim MyRequest As System.Net.HttpWebRequest

        Dim MyResponse As System.Net.HttpWebResponse

        Dim MyStream As System.IO.StreamReader
 

        MyRequest = System.Net.WebRequest.Create("http://www.abc.com")

        MyResponse = MyRequest.GetResponse()

        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0
 
LVL 2

Accepted Solution

by:
wmestrom earned 500 total points
ID: 22748477
This should work. However many servers ignore the range part... The page I used should work though.
        Dim MyRequest As System.Net.HttpWebRequest

        Dim MyResponse As System.Net.WebResponse

        Dim MyStream As System.IO.StreamReader
 

        MyRequest = DirectCast(System.Net.HttpWebRequest.Create("http://www.gnu.org/projects/dotgnu/pnetlib-doc/System/Net/HttpWebRequest.html"), System.Net.HttpWebRequest)

        MyRequest.AddRange(10000, Integer.MaxValue)

        MyResponse = MyRequest.GetResponse()

        MyStream = New System.IO.StreamReader(MyResponse.GetResponseStream())

Open in new window

0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

A while ago, I was working on a Windows Forms application and I needed a special label control with reflection (glass) effect to show some titles in a stylish way. I've always enjoyed working with graphics, but it's never too clever to re-invent …
The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now