Link to home
Start Free TrialLog in
Avatar of johnny99
johnny99

asked on

Website File Size Checking Script?

Can someone help me with a script which will simply get each page of a website via HTTP, by following (internal) links and give a report on the total size of each page?

This would need to be HTML file size, image file size, (including background images as well) and total.

Modules OK.
Avatar of snything
snything

I would probably use wget and do a recursive fetch of the whole website. Then get the size of all the data you downloaded.

But i guess you dont want to have to download the whole website. You will have to still download all the webpages, but not all the gif's n stuff, in order to get all the links to other files and images.
You can use the HEAD method to get the size of the files.

Have a look at HTML::LinkExtor, its a subset of the HTML::Parser, and has a simple example of how to get all the tag's details out of the html.
Avatar of johnny99

ASKER

Maybe I didn't mention I was being lazy.

Give me some code to do the basics (let's say, create a hash with the page URL as the key and the size as the value) and I'll handle all the rest.

More points available if you think I'm being stingy.
ASKER CERTIFIED SOLUTION
Avatar of snything
snything

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Nothing has happened on this question in the past 10 months.
It's time for cleanup!

I will leave a recommendation in the Cleanup topic area that
the answer by snything be accepted.

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer