Solved

Wget

Posted on 2010-11-19
2
471 Views
Last Modified: 2012-08-13
Hi,

I came across fetch.com which offers a pretty good solution but the price is high and solutions are geared mostly for enterprises. Is it possible to achieve what they offer with wget?

Here's the functionality i need:

I'm looking for a script where I can specify a list of domains (20-50K)  and have all site content downloaded to a main zipped file with all data. I have limited space (1tb) so I want only the text for each site and want to exclude images, flash, site files etc so the downloads are quick. Final output can be in any format.

I'm looking for a script to crawl URLs for specific keywords for domains I specify and if there is a match, the URLs will be written to a central file.

Lastly, I have a file with 100k domains and i want to append most recent site titles to create a directory. Is there a way to fetch this information from search engines?

Example.

unix.org
etc..

Output
unix.org            The UNIX System, UNIX System
etc..



Thank you very much in advance.

Best,
0
Comment
Question by:faithless1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 10

Accepted Solution

by:
wls3 earned 500 total points
ID: 34178171
As far as I know, wget (on windows) outputs a folder for each domain scanned.  This makes your requirement regarding a single zip somewhat difficult without additional scripting.
0
 

Author Comment

by:faithless1
ID: 34178679
Thanks, writing to a directory works as well.

Thanks again,
Tom
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question