Link to home
Start Free TrialLog in
Avatar of faithless1
faithless1

asked on

Wget

Hi,

I came across fetch.com which offers a pretty good solution but the price is high and solutions are geared mostly for enterprises. Is it possible to achieve what they offer with wget?

Here's the functionality i need:

I'm looking for a script where I can specify a list of domains (20-50K)  and have all site content downloaded to a main zipped file with all data. I have limited space (1tb) so I want only the text for each site and want to exclude images, flash, site files etc so the downloads are quick. Final output can be in any format.

I'm looking for a script to crawl URLs for specific keywords for domains I specify and if there is a match, the URLs will be written to a central file.

Lastly, I have a file with 100k domains and i want to append most recent site titles to create a directory. Is there a way to fetch this information from search engines?

Example.

unix.org
etc..

Output
unix.org            The UNIX System, UNIX System
etc..



Thank you very much in advance.

Best,
ASKER CERTIFIED SOLUTION
Avatar of wls3
wls3
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of faithless1
faithless1

ASKER

Thanks, writing to a directory works as well.

Thanks again,
Tom