Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Use WGET to mirror a 'weird' website

Posted on 2006-11-29
8
Medium Priority
?
1,153 Views
Last Modified: 2008-02-01
Hi all,

I have been trying out dozens of wget commands to download a complete mirror of

http://www. a c i h c d .us (without the spaces of course)

For some reason i can only get the index page and cant get any other pages.  I think it might be that the URLs are rewritten on the website perhaps with mod_rewrite and wget cant follow it?

Anyway i really need to migrate the whole thing to another server asap - including every bit and piece of the site as my friend has lost his admin to the existing server.

Whats the correct command to get this site copied entirely?  Im running Ubuntu server with KDE.

Thanks for your help!
0
Comment
Question by:NIPPLES
8 Comments
 
LVL 10

Expert Comment

by:ssvl
ID: 18035856
use wget -r
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18039381
Hi ssvl,

I have tried it and it will only download the first page but wont follow the links which are all plain html.

Any other ideas?

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18040129
Hi,

Goto: http://www.die.net/doc/linux/man/man1/wget.1.html

And their is a section called Recursive Retrieval Options

wget -r www.website.com 

That should work fine you should also try:

wget -m www.website.com
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 3

Author Comment

by:NIPPLES
ID: 18040718
Hi xDamox,

I checked it out and tried it out - I always end up getting only 1 page + images etc.  The CMS is powered by Joomla and its rewriting the urls.  Im looking for either any command that can properly mirror all the pages or a debian packaged program that can do it (if wget can't).

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18041061
Hi,

Did you try the mirror argument with wget?

wget -m www.website.com
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18041327
Hi

Yep i tried it and it wont work out.  it would normally but did you take a look at the site im trying to mirror?  it cant get more than 1 page :(
0
 
LVL 24

Accepted Solution

by:
slyong earned 500 total points
ID: 18043648
Hi Nipples (what a nick),

If you refer to http://www.gnu.org/software/wget/faq.html#3.6:

Wget doesn't feature JavaScript support and is not capable of performing recursive retrieval of URLs included in JavaScript code.

The website that you are trying to mirror or download is using JavaScript links.  So you won't be able to use wget.
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18043926
Hi slyong,

Wow yeah you are right - i never noticed the links are javascript - i should have read the source better.  I will have to save each page manually whcih is going to be really fun.  Thanks for taking a look and spotting that!  

Ah im embarrased...
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Often times it's very very easy to extend a volume on a Linux instance in AWS, but impossible to shrink it. I wanted to contribute to the experts-exchange community a way of providing a procedure that works on an AWS instance. It can also be used on…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Suggested Courses
Course of the Month14 days, 2 hours left to enroll

581 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question