Html Screen Scrapping

Hi All ,

Please help me out of this prob I want to write the code for the application which can extract the html of any web site say "http://search.techrepublic.com.com/search/screen-scraper.html"  (just u can say when we view the source for any website) in that i am searching for the specific data . Any Tutorial,or code, or any tool that can search for that data will help me a lot .

Thanks in advance  
ASINGH1974Asked:
Who is Participating?
 
aherpsConnect With a Mentor Commented:

using System;
using System.Collections.Generic;
using System.Text;
using System.Net;
using System.IO;
using System.Windows.Forms;
 
namespace WebHelper
{
    public class webpage
    {
        public string results;
        public webpage(string address)
        {
            string strResult = "";
 
            WebResponse objResponse;
            WebRequest objRequest = System.Net.HttpWebRequest.Create(address);
            objResponse = objRequest.GetResponse();
 
            using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
            {
                strResult = sr.ReadToEnd();
                // Close and clean up the StreamReader
                sr.Close();
            }
 
            this.results = strResult;
        }
    }
}

Open in new window

0
 
aherpsCommented:
Just be warned with the above - this wont work with webpages that use AJAX as the result is taken on the initial load.  Not the subsequent data
0
 
Ray PaseurCommented:
If you have access to PHP, it's very easy.  Best, ~Ray
<?php
$html = file_get_contents('http://yoursite.org/page.asp');
echo htmlentities($html);
?>

Open in new window

0
 
Neeraj SoniSr. ArchitectCommented:
The code from aherps is perhaps thestart point to begin with. 
All you need is to write a custom parser for html and identify your landmark tags in html source. From these tang you can read the inner html or text, attribute and other values.
Even you can manipulate ajax calls by identifying their url and attempt to download partial data from those urls.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.