Solved

Use WGET to mirror a 'weird' website

Posted on 2006-11-29
8
1,139 Views
Last Modified: 2008-02-01
Hi all,

I have been trying out dozens of wget commands to download a complete mirror of

http://www. a c i h c d .us (without the spaces of course)

For some reason i can only get the index page and cant get any other pages.  I think it might be that the URLs are rewritten on the website perhaps with mod_rewrite and wget cant follow it?

Anyway i really need to migrate the whole thing to another server asap - including every bit and piece of the site as my friend has lost his admin to the existing server.

Whats the correct command to get this site copied entirely?  Im running Ubuntu server with KDE.

Thanks for your help!
0
Comment
Question by:NIPPLES
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 10

Expert Comment

by:ssvl
ID: 18035856
use wget -r
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18039381
Hi ssvl,

I have tried it and it will only download the first page but wont follow the links which are all plain html.

Any other ideas?

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18040129
Hi,

Goto: http://www.die.net/doc/linux/man/man1/wget.1.html

And their is a section called Recursive Retrieval Options

wget -r www.website.com 

That should work fine you should also try:

wget -m www.website.com
0
Get proactive database performance tuning online

At Percona’s web store you can order full Percona Database Performance Audit in minutes. Find out the health of your database, and how to improve it. Pay online with a credit card. Improve your database performance now!

 
LVL 3

Author Comment

by:NIPPLES
ID: 18040718
Hi xDamox,

I checked it out and tried it out - I always end up getting only 1 page + images etc.  The CMS is powered by Joomla and its rewriting the urls.  Im looking for either any command that can properly mirror all the pages or a debian packaged program that can do it (if wget can't).

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18041061
Hi,

Did you try the mirror argument with wget?

wget -m www.website.com
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18041327
Hi

Yep i tried it and it wont work out.  it would normally but did you take a look at the site im trying to mirror?  it cant get more than 1 page :(
0
 
LVL 24

Accepted Solution

by:
slyong earned 125 total points
ID: 18043648
Hi Nipples (what a nick),

If you refer to http://www.gnu.org/software/wget/faq.html#3.6:

Wget doesn't feature JavaScript support and is not capable of performing recursive retrieval of URLs included in JavaScript code.

The website that you are trying to mirror or download is using JavaScript links.  So you won't be able to use wget.
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18043926
Hi slyong,

Wow yeah you are right - i never noticed the links are javascript - i should have read the source better.  I will have to save each page manually whcih is going to be really fun.  Thanks for taking a look and spotting that!  

Ah im embarrased...
0

Featured Post

Percona Live Europe 2017 | Sep 25 - 27, 2017

The Percona Live Open Source Database Conference Europe 2017 is the premier event for the diverse and active European open source database community, as well as businesses that develop and use open source database software.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Little introduction about CP: CP is a command on linux that use to copy files and folder from one location to another location. Example usage of CP as follow: cp /myfoder /pathto/destination/folder/ cp abc.tar.gz /pathto/destination/folder/ab…
SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
Suggested Courses

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question