sriramvemaraju2000
asked on
project to keep track of webpages
I want to develop a project in java & may be in Linux to download web pages n keep track of those web pages which are changing.
First how to download the pages?
Next how do I come to know whether a web page is changing or not?
For example , if i have www.google.com in my database and if I had subscribed for that page, so how do I come to know that google has changed ?
what are the parameters that I should look that I come to know whether google has changed when the server periodically downloads it ?
Do I need to calculate the md5 of the page?
Ideas please>>>>>>
First how to download the pages?
Next how do I come to know whether a web page is changing or not?
For example , if i have www.google.com in my database and if I had subscribed for that page, so how do I come to know that google has changed ?
what are the parameters that I should look that I come to know whether google has changed when the server periodically downloads it ?
Do I need to calculate the md5 of the page?
Ideas please>>>>>>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Begin with a java open source web crawler and take it from there - that's what i'd do
http://java-source.net/open-source/crawlers
http://java-source.net/open-source/crawlers
:)
ASKER
particularly how to get the web pages and how to track the modifications?
I think this will be useful for security purposes.. I can tell whether my page has hacked or not..