Posted on 2010-11-19
I came across fetch.com which offers a pretty good solution but the price is high and solutions are geared mostly for enterprises. Is it possible to achieve what they offer with wget?
Here's the functionality i need:
I'm looking for a script where I can specify a list of domains (20-50K) and have all site content downloaded to a main zipped file with all data. I have limited space (1tb) so I want only the text for each site and want to exclude images, flash, site files etc so the downloads are quick. Final output can be in any format.
I'm looking for a script to crawl URLs for specific keywords for domains I specify and if there is a match, the URLs will be written to a central file.
Lastly, I have a file with 100k domains and i want to append most recent site titles to create a directory. Is there a way to fetch this information from search engines?
unix.org The UNIX System, UNIX System
Thank you very much in advance.