Website File Size Checking Script?

johnny99
johnny99 used Ask the Experts™
on
Can someone help me with a script which will simply get each page of a website via HTTP, by following (internal) links and give a report on the total size of each page?

This would need to be HTML file size, image file size, (including background images as well) and total.

Modules OK.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
I would probably use wget and do a recursive fetch of the whole website. Then get the size of all the data you downloaded.

But i guess you dont want to have to download the whole website. You will have to still download all the webpages, but not all the gif's n stuff, in order to get all the links to other files and images.
You can use the HEAD method to get the size of the files.

Have a look at HTML::LinkExtor, its a subset of the HTML::Parser, and has a simple example of how to get all the tag's details out of the html.

Author

Commented:
Maybe I didn't mention I was being lazy.

Give me some code to do the basics (let's say, create a hash with the page URL as the key and the size as the value) and I'll handle all the rest.

More points available if you think I'm being stingy.
Commented:
Well, if you want to be really lazy, you'd wget the whole thing. It would just cost a bit of bandwidth.

But, your server will need wget installed, not to mention the free space.

Something like this:
#!/usr/bin/perl
$website = 'www.foobah.com';

#this will store all the files in the www.foobah.com/ directory, under the current directory
system("wget -r http://$websiteUrl/");
$size = `du -sh $website`;
print "Size is $size\n";
#system("rm -fR $website");#uncoment this to delete all the files...

#end
Nothing has happened on this question in the past 10 months.
It's time for cleanup!

I will leave a recommendation in the Cleanup topic area that
the answer by snything be accepted.

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial