Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

wget saving web page help

Posted on 2011-03-23
4
102 Views
Last Modified: 2016-05-10
Hello,

I need a bit of help with wget.

When a user submits a URL I want to use wget to create an archive/backup of that specific page.  I want to include all the contents of the page i.e. css, images, js etc...

I have the following code and it's working about 90% of what I need.

exec("wget -e robots=off --limit-rate=250k -F -P /home/USERNAME/public_html/results/". $rnd1 ."/". $rnd2 ."/"." -p -k -E ". $site_url ."");

Open in new window


The problem with this code is if a user submits a URL like this:

http://techcrunch.com/2011/03/22/digital-textbook-startup-inkling-nabs-multi-million-dollar-investment-from-mcgraw-hill-and-pearson/

The backup will be structured this way:

[ techcrunch.com - Folder ] / [ 2011 - Folder ] / [ 03 - Folder ] / [ 22 - Folder ] / [ digital-textbook-startup-inkling-nabs-multi-million-dollar-investment-from-mcgraw-hill-and-pearson - Folder ]

Techcrunch File
The html will load all the images from main site (techcrunch.com)

However if the user submits a URL like this:

http://blog.joerogan.net/archives/2889

The backup will contain all the images, css, etc...

Joerogan File


I hope this makes sense.  If not I will try to clarify.
0
Comment
Question by:jambla
4 Comments
 
LVL 9

Accepted Solution

by:
absx earned 500 total points
ID: 35198540
Hi,

There's just too many features in wget to ever get the command correct manually. I would suggest playing around with a tool like wgetGUI (http://www.jensroesner.de/wgetgui/), until you have a set of options that does exactly what you need, and then picking these arguments for the script.
0
 

Author Comment

by:jambla
ID: 35202147
Hello absx,

Thanks for the link, I will have a look to see if it can help me out.


Any one else have any suggestions?
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question