Solved

Use WGET to mirror a 'weird' website

Posted on 2006-11-29
8
1,114 Views
Last Modified: 2008-02-01
Hi all,

I have been trying out dozens of wget commands to download a complete mirror of

http://www. a c i h c d .us (without the spaces of course)

For some reason i can only get the index page and cant get any other pages.  I think it might be that the URLs are rewritten on the website perhaps with mod_rewrite and wget cant follow it?

Anyway i really need to migrate the whole thing to another server asap - including every bit and piece of the site as my friend has lost his admin to the existing server.

Whats the correct command to get this site copied entirely?  Im running Ubuntu server with KDE.

Thanks for your help!
0
Comment
Question by:NIPPLES
8 Comments
 
LVL 10

Expert Comment

by:ssvl
Comment Utility
use wget -r
0
 
LVL 3

Author Comment

by:NIPPLES
Comment Utility
Hi ssvl,

I have tried it and it will only download the first page but wont follow the links which are all plain html.

Any other ideas?

Thanks!
0
 
LVL 16

Expert Comment

by:xDamox
Comment Utility
Hi,

Goto: http://www.die.net/doc/linux/man/man1/wget.1.html

And their is a section called Recursive Retrieval Options

wget -r www.website.com

That should work fine you should also try:

wget -m www.website.com
0
 
LVL 3

Author Comment

by:NIPPLES
Comment Utility
Hi xDamox,

I checked it out and tried it out - I always end up getting only 1 page + images etc.  The CMS is powered by Joomla and its rewriting the urls.  Im looking for either any command that can properly mirror all the pages or a debian packaged program that can do it (if wget can't).

Thanks!
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 16

Expert Comment

by:xDamox
Comment Utility
Hi,

Did you try the mirror argument with wget?

wget -m www.website.com
0
 
LVL 3

Author Comment

by:NIPPLES
Comment Utility
Hi

Yep i tried it and it wont work out.  it would normally but did you take a look at the site im trying to mirror?  it cant get more than 1 page :(
0
 
LVL 24

Accepted Solution

by:
slyong earned 125 total points
Comment Utility
Hi Nipples (what a nick),

If you refer to http://www.gnu.org/software/wget/faq.html#3.6:

Wget doesn't feature JavaScript support and is not capable of performing recursive retrieval of URLs included in JavaScript code.

The website that you are trying to mirror or download is using JavaScript links.  So you won't be able to use wget.
0
 
LVL 3

Author Comment

by:NIPPLES
Comment Utility
Hi slyong,

Wow yeah you are right - i never noticed the links are javascript - i should have read the source better.  I will have to save each page manually whcih is going to be really fun.  Thanks for taking a look and spotting that!  

Ah im embarrased...
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Join & Write a Comment

I am a long time windows user and for me it is normal to have spaces in directory and file names. Changing to Linux I found myself frustrated when I moved my windows data over to my new Linux computer. The problem occurs when at the command line.…
Little introduction about CP: CP is a command on linux that use to copy files and folder from one location to another location. Example usage of CP as follow: cp /myfoder /pathto/destination/folder/ cp abc.tar.gz /pathto/destination/folder/ab…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now