html to image

Hello Experts,

Thumbalizr.com, Thumbshots.org provides thumbshots for DMOZ website. I think this feature is very cool.

I wonder if Perl can do this: Visit websites automatically and print(save) the homepage of the destination website in Image format.

I think first I need to have Perl visit those site, I know that can be realized by LWP.

Next, I wonder if I shall have perl save the homepage locally and then convert the html page into image format.

Regarding the 2nd step, I have no idea about how to achieve that. I searched Internet and did not find useful information about Convert HTML to Image in Linux.

I know there are some programs can convert HTML to PDF in linux. Can anybody give me a hint how to convert HTML to Image (GIF,PNG,JPEG) in linux?

Thanks a lot!
jeffrey1101Asked:
Who is Participating?
 
muffCommented:
I think cutycapt will do what you want:

http://cutycapt.sourceforge.net/
0
 
jeffrey1101Author Commented:
cutycapt is for Win32 systems. i am looking some tools for linux.
0
 
TobiasCommented:
Dear,

CutyCapt is a Qt program then it could be compiled in Linux too.

One idea, that would be not the best idea it's to make screen-shot of the website and save it as a image.

Best Regards
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
jeffrey1101Author Commented:
how do big websites accomplish this kind of tasks? are there any commerical software to make html to image in linux?

i am looking for an easy way to script it...
0
 
muffCommented:
Browsershots simply has an array of PCs (virtual I would have thought but from what you can see of the task bar, they may just be donated), that simply do screenshots.  Very clumsy.

But yes, other "big" websites would use something similar to cutycapt rather than the method you describe because it needs to be rendered as it would appear in a browser.  Cutycapt uses the webkit engine popular in safari and chrome browsers.  So the output from webkit would appear the same from a browser as it would from cutycapt.

Converting html into any sort of output takes a rendering engine, and some htmltopdf converters produce very different results than you would see in a browser.

The other options are one based on gecko, the engine behind mozilla products, or somehow usurping the engine in IE.

But cutycapt is already done, and you would be able to take advantage of it.  Alternatively, you can use

http://www.blogs.uni-osnabrueck.de/rotapken/2008/12/03/create-screenshots-of-a-web-page-using-python-and-qtwebkit/

Which is a python script - it still needs qt though.
0
 
jeffrey1101Author Commented:
nope, i still don't like the cutycapt idea/codes. how can i write a webkit engine myself.
0
 
muffCommented:
You cannot simply write a webkit engine - it is on the order of writing a browser from scratch.

Use the link provided (http://www.blogs.uni-osnabrueck.de/rotapken/2008/12/03/create-screenshots-of-a-web-page-using-python-and-qtwebkit/) to write a script in python to do the screen captures using the python-webkit bindings.

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.