project to keep track of webpages

I want to develop a project in java & may be in Linux to download web pages n keep track of those web pages which are changing.

First how to download the pages?

Next how do I come to know whether a web page is changing or not?
For example , if i have www.google.com in my database and if I had subscribed for that page, so how do I come to know that google has changed ?

what are the parameters that I should look that I come to know whether google has changed when the server periodically downloads it ?

Do I need to calculate the md5 of the page?

Ideas please>>>>>>
sriramvemaraju2000Asked:
Who is Participating?
 
CEHJCommented:
Very few pages these days will be static. In the rare cases where they are, and where the web server supports it, you can check

http://www.feedthebot.com/ifmodified.html

Otherwise, yes, a checksumming approach would be one way. If the pages are small enough to hold in memory twice you can do a string comparison - it'll be faster
0
 
sriramvemaraju2000Author Commented:
can you tell me how to do this in java? what is the best approach?
particularly how to get the web pages and how to track the modifications?
I think this will be useful for security purposes.. I can tell whether my page has hacked or not..
0
 
CEHJCommented:
Begin with a java open source web crawler and take it from there - that's what i'd do

http://java-source.net/open-source/crawlers
0
 
CEHJCommented:
:)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.