Solved

Use WGET to mirror a 'weird' website

Posted on 2006-11-29
8
1,123 Views
Last Modified: 2008-02-01
Hi all,

I have been trying out dozens of wget commands to download a complete mirror of

http://www. a c i h c d .us (without the spaces of course)

For some reason i can only get the index page and cant get any other pages.  I think it might be that the URLs are rewritten on the website perhaps with mod_rewrite and wget cant follow it?

Anyway i really need to migrate the whole thing to another server asap - including every bit and piece of the site as my friend has lost his admin to the existing server.

Whats the correct command to get this site copied entirely?  Im running Ubuntu server with KDE.

Thanks for your help!
0
Comment
Question by:NIPPLES
8 Comments
 
LVL 10

Expert Comment

by:ssvl
ID: 18035856
use wget -r
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18039381
Hi ssvl,

I have tried it and it will only download the first page but wont follow the links which are all plain html.

Any other ideas?

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18040129
Hi,

Goto: http://www.die.net/doc/linux/man/man1/wget.1.html

And their is a section called Recursive Retrieval Options

wget -r www.website.com 

That should work fine you should also try:

wget -m www.website.com
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 3

Author Comment

by:NIPPLES
ID: 18040718
Hi xDamox,

I checked it out and tried it out - I always end up getting only 1 page + images etc.  The CMS is powered by Joomla and its rewriting the urls.  Im looking for either any command that can properly mirror all the pages or a debian packaged program that can do it (if wget can't).

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
ID: 18041061
Hi,

Did you try the mirror argument with wget?

wget -m www.website.com
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18041327
Hi

Yep i tried it and it wont work out.  it would normally but did you take a look at the site im trying to mirror?  it cant get more than 1 page :(
0
 
LVL 24

Accepted Solution

by:
slyong earned 125 total points
ID: 18043648
Hi Nipples (what a nick),

If you refer to http://www.gnu.org/software/wget/faq.html#3.6:

Wget doesn't feature JavaScript support and is not capable of performing recursive retrieval of URLs included in JavaScript code.

The website that you are trying to mirror or download is using JavaScript links.  So you won't be able to use wget.
0
 
LVL 3

Author Comment

by:NIPPLES
ID: 18043926
Hi slyong,

Wow yeah you are right - i never noticed the links are javascript - i should have read the source better.  I will have to save each page manually whcih is going to be really fun.  Thanks for taking a look and spotting that!  

Ah im embarrased...
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The purpose of this article is to demonstrate how we can use conditional statements using Python.
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

816 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now