If wget is not user friendly you can try httrack
http://www.httrack.com/
that does the same but with a more user friendly interface
Main Topics
Browse All TopicsHi,
I want to download urls recursively,
starting from : http://code.google.com/api
but I want to download only those URLs which
match the this pattern :
http://code.google.com/api
I tried wget -r -D http://code.google.com/api
but it downloads only index.html and stops.
I tried few other options but they didn't work as intended either.
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
If wget is not user friendly you can try httrack
http://www.httrack.com/
that does the same but with a more user friendly interface
Hi Topio,
I want to download all urls matching this pattern : http://code.google.com/api
I used the following options :
URL -> http://code.google.com/api
Set Options -> Scan Rules -> Include Links -> Criterion -> Folder names containing : String: flash
Limits -> Max mirroring depth : 5
Limits -> Max external depth : 3
I got the following error :
--------------------------
WinHTTrack Website Copier
--------------------------
* * MIRROR ERROR! * *
HTTrack has detected that the current mirror is empty. If it was an update, the previous mirror has been restored.
Reason: the first page(s) either could not be found, or a connection problem occured.
=> Ensure that the website still exists, and/or check your proxy settings! <=
--------------------------
OK
--------------------------
Hi kukno,
I tried this :
wget --include-directories=flas
in order to download all the urls matching http://code.google.com/api
but, only `code.google.com/apis/maps
Sorry for the wrong links...
here are the correct ones :
http://code.google.com/api
http://code.google.com/api
hm.. if you use a wildcard in the option, it will download a lot more:
wget --include-directories=*fla
However, then it's no longer limited to the path /apis/maps/documention/. I think wget is not able to do what you need. If you are not limited to Windows as platform, you could try pavuk.
http://www.pavuk.org/man.h
pavuk support regular expressions in the URL and also recursive download.
Regards
Kurt
Hm... no linux.... O.K. here is another alternative: w3mir. It's perl based and not restricted to linux. Actually I tried it on windows and it works as expected.
http://www.langfeldt.net/w
Download the w3mir. Unpack it and read the file INSTALL.w32. Basically it's the following steps to "install" it on windows.
get and install winzip from http://www.winzip.com/
get and install ActivePerl (now Build 509) from http://www.activeperl.com/
get nmake.exe from ftp://ftp.microsoft.com/So
After installing the tools above, do this in the unpacked w3mir directory
perl makefile.pl
nmake
nmake install
After that w3mir will be installed in the default path of your perl Installation.
w3mir -h
Here is a sample file for your problem: w3mir.cfg
# Retrive all of janl's home pages:
Options: recurse
#
# This is the two argument form of URL:. It fetches the first into the second
URL: http://code.google.com/api
Fetch-RE: m/flash/
cd: d:\mirror
Then run w3mir like this:
mkdir d:\mirror
w3mir -cfgfile w3mir.cfg
Regards
Kurt
Business Accounts
Answer for Membership
by: kuknoPosted on 2008-10-10 at 13:52:10ID: 22690699
Hi,
/wget
there is an option "-I" or "--include-directories".
From the man page: http://linux.die.net/man/1
-I list
--include-directories=list
Specify a comma-separated list of directories you wish to follow when downloading Elements of list may contain wildcards.
Sample: wget --include-directories *test*,*test2* -r http://www....
Regards
Kurt