Solved

wget saving web page help

Posted on 2011-03-23
4
119 Views
Last Modified: 2016-05-10
Hello,

I need a bit of help with wget.

When a user submits a URL I want to use wget to create an archive/backup of that specific page.  I want to include all the contents of the page i.e. css, images, js etc...

I have the following code and it's working about 90% of what I need.

exec("wget -e robots=off --limit-rate=250k -F -P /home/USERNAME/public_html/results/". $rnd1 ."/". $rnd2 ."/"." -p -k -E ". $site_url ."");

Open in new window


The problem with this code is if a user submits a URL like this:

http://techcrunch.com/2011/03/22/digital-textbook-startup-inkling-nabs-multi-million-dollar-investment-from-mcgraw-hill-and-pearson/

The backup will be structured this way:

[ techcrunch.com - Folder ] / [ 2011 - Folder ] / [ 03 - Folder ] / [ 22 - Folder ] / [ digital-textbook-startup-inkling-nabs-multi-million-dollar-investment-from-mcgraw-hill-and-pearson - Folder ]

Techcrunch File
The html will load all the images from main site (techcrunch.com)

However if the user submits a URL like this:

http://blog.joerogan.net/archives/2889

The backup will contain all the images, css, etc...

Joerogan File


I hope this makes sense.  If not I will try to clarify.
0
Comment
Question by:jambla
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 9

Accepted Solution

by:
absx earned 500 total points
ID: 35198540
Hi,

There's just too many features in wget to ever get the command correct manually. I would suggest playing around with a tool like wgetGUI (http://www.jensroesner.de/wgetgui/), until you have a set of options that does exactly what you need, and then picking these arguments for the script.
0
 

Author Comment

by:jambla
ID: 35202147
Hello absx,

Thanks for the link, I will have a look to see if it can help me out.


Any one else have any suggestions?
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you are a web developer, you would be aware of the <iframe> tag in HTML. The <iframe> stands for inline frame and is used to embed another document within the current HTML document. The embedded document could be even another website.
This article discusses how to implement server side field validation and display customized error messages to the client.
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to moveā€¦
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question