Can anyone recommend an open source Java program that crawls the web? So far the best I have found is Heritrix (http://crawler.archive.org/
). However, it uses RAM rather than disk to store the list of URLs that it has already visited and the queue of URLs it has encountered but not visited. This is major limitation, since broad crawls (crawls of all links encountered) will consume many gigabytes. Thus, I am seeking a freely available web crawler written in Java that uses a bounded amount of RAM regardless of the size of the crawl.