Html Screen Scrapping

Posted on 2008-11-10
Last Modified: 2012-08-13
Hi All ,

Please help me out of this prob I want to write the code for the application which can extract the html of any web site say ""  (just u can say when we view the source for any website) in that i am searching for the specific data . Any Tutorial,or code, or any tool that can search for that data will help me a lot .

Thanks in advance  
Question by:ASINGH1974
    LVL 7

    Accepted Solution


    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Net;
    using System.IO;
    using System.Windows.Forms;
    namespace WebHelper
        public class webpage
            public string results;
            public webpage(string address)
                string strResult = "";
                WebResponse objResponse;
                WebRequest objRequest = System.Net.HttpWebRequest.Create(address);
                objResponse = objRequest.GetResponse();
                using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
                    strResult = sr.ReadToEnd();
                    // Close and clean up the StreamReader
                this.results = strResult;

    Open in new window

    LVL 7

    Expert Comment

    Just be warned with the above - this wont work with webpages that use AJAX as the result is taken on the initial load.  Not the subsequent data
    LVL 107

    Expert Comment

    by:Ray Paseur
    If you have access to PHP, it's very easy.  Best, ~Ray
    $html = file_get_contents('');
    echo htmlentities($html);

    Open in new window

    LVL 6

    Expert Comment

    by:Neeraj Soni
    The code from aherps is perhaps thestart point to begin with. 
    All you need is to write a custom parser for html and identify your landmark tags in html source. From these tang you can read the inner html or text, attribute and other values.
    Even you can manipulate ajax calls by identifying their url and attempt to download partial data from those urls.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    6 Surprising Benefits of Threat Intelligence

    All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

    Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
    It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
    In this tutorial viewers will learn how add a full-size background image to a webpage using CSS3. Create a new HTML document with an internal stylesheet.: In CSS, define the html element to have a background image. Use a high resolution image.: In t…
    In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …

    779 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    9 Experts available now in Live!

    Get 1:1 Help Now