Html Screen Scrapping

Hi All ,

Please help me out of this prob I want to write the code for the application which can extract the html of any web site say "http://search.techrepublic.com.com/search/screen-scraper.html"  (just u can say when we view the source for any website) in that i am searching for the specific data . Any Tutorial,or code, or any tool that can search for that data will help me a lot .

Thanks in advance  
ASINGH1974Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

aherpsCommented:

using System;
using System.Collections.Generic;
using System.Text;
using System.Net;
using System.IO;
using System.Windows.Forms;
 
namespace WebHelper
{
    public class webpage
    {
        public string results;
        public webpage(string address)
        {
            string strResult = "";
 
            WebResponse objResponse;
            WebRequest objRequest = System.Net.HttpWebRequest.Create(address);
            objResponse = objRequest.GetResponse();
 
            using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
            {
                strResult = sr.ReadToEnd();
                // Close and clean up the StreamReader
                sr.Close();
            }
 
            this.results = strResult;
        }
    }
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
aherpsCommented:
Just be warned with the above - this wont work with webpages that use AJAX as the result is taken on the initial load.  Not the subsequent data
0
Ray PaseurCommented:
If you have access to PHP, it's very easy.  Best, ~Ray
<?php
$html = file_get_contents('http://yoursite.org/page.asp');
echo htmlentities($html);
?>

Open in new window

0
Neeraj SoniSr. ArchitectCommented:
The code from aherps is perhaps thestart point to begin with. 
All you need is to write a custom parser for html and identify your landmark tags in html source. From these tang you can read the inner html or text, attribute and other values.
Even you can manipulate ajax calls by identifying their url and attempt to download partial data from those urls.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
HTML

From novice to tech pro — start learning today.