Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Use WGET to mirror a 'weird' website

Posted on 2006-11-29
8
Medium Priority
?
1,150 Views
Last Modified: 2008-02-01
Hi all,

I have been trying out dozens of wget commands to download a complete mirror of

http://www. a c i h c d .us (without the spaces of course)

For some reason i can only get the index page and cant get any other pages.  I think it might be that the URLs are rewritten on the website perhaps with mod_rewrite and wget cant follow it?

Anyway i really need to migrate the whole thing to another server asap - including every bit and piece of the site as my friend has lost his admin to the existing server.

Whats the correct command to get this site copied entirely?  Im running Ubuntu server with KDE.

Thanks for your help!
0
Comment
Question by:NIPPLES
8 Comments
 
LVL 10

Expert Comment

by:ssvl
ID: 18035856
use wget -r
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18039381
Hi ssvl,

I have tried it and it will only download the first page but wont follow the links which are all plain html.

Any other ideas?

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18040129
Hi,

Goto: http://www.die.net/doc/linux/man/man1/wget.1.html

And their is a section called Recursive Retrieval Options

wget -r www.website.com 

That should work fine you should also try:

wget -m www.website.com
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 3

Author Comment

by:NIPPLES
ID: 18040718
Hi xDamox,

I checked it out and tried it out - I always end up getting only 1 page + images etc.  The CMS is powered by Joomla and its rewriting the urls.  Im looking for either any command that can properly mirror all the pages or a debian packaged program that can do it (if wget can't).

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18041061
Hi,

Did you try the mirror argument with wget?

wget -m www.website.com
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18041327
Hi

Yep i tried it and it wont work out.  it would normally but did you take a look at the site im trying to mirror?  it cant get more than 1 page :(
0
 
LVL 24

Accepted Solution

by:
slyong earned 500 total points
ID: 18043648
Hi Nipples (what a nick),

If you refer to http://www.gnu.org/software/wget/faq.html#3.6:

Wget doesn't feature JavaScript support and is not capable of performing recursive retrieval of URLs included in JavaScript code.

The website that you are trying to mirror or download is using JavaScript links.  So you won't be able to use wget.
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18043926
Hi slyong,

Wow yeah you are right - i never noticed the links are javascript - i should have read the source better.  I will have to save each page manually whcih is going to be really fun.  Thanks for taking a look and spotting that!  

Ah im embarrased...
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

rdate is a Linux command and the network time protocol for immediate date and time setup from another machine. The clocks are synchronized by entering rdate with the -s switch (command without switch just checks the time but does not set anything). …
Introduction We as admins face situation where we need to redirect websites to another. This may be required as a part of an upgrade keeping the old URL but website should be served from new URL. This document would brief you on different ways ca…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses
Course of the Month10 days, 22 hours left to enroll

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question