?
Solved

Convert dynamic URL's with wget to .html files

Posted on 2011-03-22
4
Medium Priority
?
666 Views
Last Modified: 2012-05-11
Hello,

I'm have a problem when I use wget to download/archive a webpage.

If I download "http://example.com" it works fine also if I download "http://example.com/page.html" that's OK too.

My problem is when I have a URL something like this:

"http://example.com/page.php?id=99"
OR
"http://example.com/index.html?hpt=T1"

These download fine but when I browse to them the page that shows is the HTML code not the browser rendered version.

So the question is how can I force all pages to become .htm or .html files

Here is my code:

<?php

$site = 'http://example.com/index.php?id=680';

$rnd1 = rand(100, 9999);
$rnd2 = rand(100, 9999);

mkdir("/home/USER/public_html/results/". $rnd1 . "/", 0777);
mkdir("/home/USER/public_html/results/". $rnd1 . "/". $rnd2 ."/", 0777);

exec("wget -e robots=off --limit-rate=250k -F -P /home/USESR/public_html/results/". $rnd1 ."/". $rnd2 ."/"." -p -k ". $site ."");

?> 

Open in new window



Thanks for the help!
0
Comment
Question by:jambla
  • 3
4 Comments
 
LVL 5

Expert Comment

by:tsmgeek
ID: 35194479
im guessing the problem you are having is the files do not actualy have .html on the end but instead its got the query params concatinated on the end, you need to change this or append .html to the end of every file

personaly i would use curl to get the page then save it into a file that i name myself
0
 

Author Comment

by:jambla
ID: 35196152
Hello tsmgeek,

Thanks for your response.

im guessing the problem you are having is the files do not actualy have .html on the end but instead its got the query params concatinated on the end, you need to change this or append .html to the end of every file

Yeah, I'm pretty sure that's the problem.  Which is the main point of my questions; how do I do this?

personaly i would use curl to get the page then save it into a file that i name myself

Yeah, I prefer cURL also, my big problem with curl is I was only able to save the html but I was not able to save the css, images, js etc...  I am not partial to using wget so if you know how to do what I need using curl or any other web language (except .asp/.net) than I'm ok with that.

0
 

Accepted Solution

by:
jambla earned 0 total points
ID: 35197493
I managed to find the answer.  Using a -E in my wget statement will force a non-html extension to be one.
0
 

Author Closing Comment

by:jambla
ID: 35230010
I found my own solution.
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In real business world data are crucial and sometimes data are shared among different information systems. Hence, an agreeable file transfer protocol need to be established.
We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…
Starting up a Project
Suggested Courses
Course of the Month16 days, 15 hours left to enroll

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question