Link to home
Start Free TrialLog in
Avatar of WestCoast_BC
WestCoast_BCFlag for Canada

asked on

Is there a tool to automatically create screenshots of all pages on a website

Does anyone know of a tool that will recursively and automatically go to every page on a website and create a screenshot of the page?
Avatar of David Favor
David Favor
Flag of United States of America image

There are several ways to accomplish something like what you describe.

Note: Whether this is legal or not is determined by the licensing terms of each site.

1) You can mirror an entire Website's content using...

wget --no-verbose --mirror --convert-links --continue --level=5 --adjust-extension --no-parent --page-requisites \
     --wait=5 --random-wait --limit-rate=200k --include-directories=editorial-guidelines --user-agent='"$ua" $URL

Open in new window


This will produce a live copy of the site, not screen shots.

2) To produce an actual screen shot of each page, you'd have to write code to build a tree of all the sites URLs, to some depth, then use a tool like http://PhantomJS.org to take the actual screen shots.

You must use PhantomJS or or a similar tool which runs all Javascript + CSS on a site, as most sites require both Javascript + CSS to run these days to correctly render a page, as a visitor sees the page.
Avatar of WestCoast_BC

ASKER

I would like to take screen shots of my website so licensing should not be an issue
The likely you'll take option #2.

There are many tools that take a single screen snapshot. The PhantomJS example directory contains one such tool.

Problems you'll have to resolve through coding.

1) Building a recursive tree of the entire site.

2) Walking this tree pruning duplicate pages.

For example, many sites organize content many ways - categories, tags, table of contents - which all must be walked + then duplicates pruned so only pages are visited + screen shots taken once.

This process will dramatically speed up your tool.

3) You'll have to determine the depth of link searching on each page.

In other words, each page will contain links to other pages which also have links to other pages.

You must determine the depth to process links, else your tool will recurse forever in an infinite loop.

4) Once you determine your pages, you'll now have to determine your exact screen shot method.

For example, image a site like Medium with massively long content, if you take a screen shot into a .png image, then taking a screen shot of the entire page will make a massively long image which will be unreadable + will be a .png image so the text component will be lost.

You'll have to determine the screen shot file type + whether to take a single screen shot of the entire page or attempt to split pages into multiple files, based on page length.

5) No tool I've ever come across does all this, which is why most people use wget --mirror then process local copies of files however they like.
Hi WC_BC,

Depends on what you mean by "screenshot". If you're OK with an HTML file as a "screenshot", HTTrack (free, GPL) works very well:
https://www.httrack.com/

It downloads an entire site, recursively building all pages.

If you don't think of HTML as a "screenshot", but are OK with PDF as a "screenshot", you could still use HTTrack as the first step, then use wkhtmltopdf (open source, LGPLv3) to go from HTML to PDF:
https://wkhtmltopdf.org/

If you don't think of PDF as a "screenshot", but are OK with PNG as a "screenshot", you could still use HTTrack as the first step, wkhtmltopdf as the second step, then Xpdf's PDFtoPNG (open source, GPLv2, GPLv3) to go from PDF to PNG:
https://www.xpdfreader.com/download.html

For that last one, these three five-minute EE video Micro Tutorials should be helpful:
Xpdf - Command Line Utility for PDF Files
Xpdf - PDFtoPNG - Command Line Utility to Convert a Multi-page PDF File into Separate PNG Files
xpdfrc - Configuration File for All Xpdf Utilities

Regards, Joe
I once had a free tool that would automatically go through an entire website and create screen shots as image files to the entire site. I no longer can find the tool unfortunately. Thank you everyone for your suggestions,
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.