Solved

Convert dynamic URL's with wget to .html files

Posted on 2011-03-22
4
654 Views
Last Modified: 2012-05-11
Hello,

I'm have a problem when I use wget to download/archive a webpage.

If I download "http://example.com" it works fine also if I download "http://example.com/page.html" that's OK too.

My problem is when I have a URL something like this:

"http://example.com/page.php?id=99"
OR
"http://example.com/index.html?hpt=T1"

These download fine but when I browse to them the page that shows is the HTML code not the browser rendered version.

So the question is how can I force all pages to become .htm or .html files

Here is my code:

<?php

$site = 'http://example.com/index.php?id=680';

$rnd1 = rand(100, 9999);
$rnd2 = rand(100, 9999);

mkdir("/home/USER/public_html/results/". $rnd1 . "/", 0777);
mkdir("/home/USER/public_html/results/". $rnd1 . "/". $rnd2 ."/", 0777);

exec("wget -e robots=off --limit-rate=250k -F -P /home/USESR/public_html/results/". $rnd1 ."/". $rnd2 ."/"." -p -k ". $site ."");

?> 

Open in new window



Thanks for the help!
0
Comment
Question by:jambla
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 5

Expert Comment

by:tsmgeek
ID: 35194479
im guessing the problem you are having is the files do not actualy have .html on the end but instead its got the query params concatinated on the end, you need to change this or append .html to the end of every file

personaly i would use curl to get the page then save it into a file that i name myself
0
 

Author Comment

by:jambla
ID: 35196152
Hello tsmgeek,

Thanks for your response.

im guessing the problem you are having is the files do not actualy have .html on the end but instead its got the query params concatinated on the end, you need to change this or append .html to the end of every file

Yeah, I'm pretty sure that's the problem.  Which is the main point of my questions; how do I do this?

personaly i would use curl to get the page then save it into a file that i name myself

Yeah, I prefer cURL also, my big problem with curl is I was only able to save the html but I was not able to save the css, images, js etc...  I am not partial to using wget so if you know how to do what I need using curl or any other web language (except .asp/.net) than I'm ok with that.

0
 

Accepted Solution

by:
jambla earned 0 total points
ID: 35197493
I managed to find the answer.  Using a -E in my wget statement will force a non-html extension to be one.
0
 

Author Closing Comment

by:jambla
ID: 35230010
I found my own solution.
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Make the most of your online learning experience.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question