• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 311
  • Last Modified:

project to keep track of webpages

I want to develop a project in java & may be in Linux to download web pages n keep track of those web pages which are changing.

First how to download the pages?

Next how do I come to know whether a web page is changing or not?
For example , if i have www.google.com in my database and if I had subscribed for that page, so how do I come to know that google has changed ?

what are the parameters that I should look that I come to know whether google has changed when the server periodically downloads it ?

Do I need to calculate the md5 of the page?

Ideas please>>>>>>
0
sriramvemaraju2000
Asked:
sriramvemaraju2000
  • 3
1 Solution
 
CEHJCommented:
Very few pages these days will be static. In the rare cases where they are, and where the web server supports it, you can check

http://www.feedthebot.com/ifmodified.html

Otherwise, yes, a checksumming approach would be one way. If the pages are small enough to hold in memory twice you can do a string comparison - it'll be faster
0
 
sriramvemaraju2000Author Commented:
can you tell me how to do this in java? what is the best approach?
particularly how to get the web pages and how to track the modifications?
I think this will be useful for security purposes.. I can tell whether my page has hacked or not..
0
 
CEHJCommented:
Begin with a java open source web crawler and take it from there - that's what i'd do

http://java-source.net/open-source/crawlers
0
 
CEHJCommented:
:)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now