Solved

Wget

Posted on 2010-11-19
2
468 Views
Last Modified: 2012-08-13
Hi,

I came across fetch.com which offers a pretty good solution but the price is high and solutions are geared mostly for enterprises. Is it possible to achieve what they offer with wget?

Here's the functionality i need:

I'm looking for a script where I can specify a list of domains (20-50K)  and have all site content downloaded to a main zipped file with all data. I have limited space (1tb) so I want only the text for each site and want to exclude images, flash, site files etc so the downloads are quick. Final output can be in any format.

I'm looking for a script to crawl URLs for specific keywords for domains I specify and if there is a match, the URLs will be written to a central file.

Lastly, I have a file with 100k domains and i want to append most recent site titles to create a directory. Is there a way to fetch this information from search engines?

Example.

unix.org
etc..

Output
unix.org            The UNIX System, UNIX System
etc..



Thank you very much in advance.

Best,
0
Comment
Question by:faithless1
2 Comments
 
LVL 10

Accepted Solution

by:
wls3 earned 500 total points
ID: 34178171
As far as I know, wget (on windows) outputs a folder for each domain scanned.  This makes your requirement regarding a single zip somewhat difficult without additional scripting.
0
 

Author Comment

by:faithless1
ID: 34178679
Thanks, writing to a directory works as well.

Thanks again,
Tom
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
Background Still having to process all these year-end "csv" files received from all these sources (including Government entities), sometimes we have the need to examine the contents due to data error, etc... As a "Unix" shop, our only readily …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

932 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now