Solved

Convert PDF to IMAGE with perl/pythjon

Posted on 2013-11-24
10
1,686 Views
Last Modified: 2013-11-25
OK
We've no more time to spend searching for a solution because bug is too old now :(.

SO let's start with a perl or python script. Please help me to write a script  according to specs below :
- choice of perl or python will be done according to easiest way to install and configure perl/python for a web server and restrict to some directory only on a VirtualHost apache conf.

- script will be called from an GET HTTP call.
- arguments of script are:
  * docroot relative path of source pdf file
  * docroot relative path of a destination lowres folder
  * docroot relative path of a destination thumbnail folder

e.g. http://my.project.com/foo/convertpdf.pl?src=data/upload/ftp/blah/myfile.pdf&lowres=/data/upload/_LOWRES/ftp/blah&thumb=/data/upload/_THUMB/ftp/blah

- script will produces 2 jpg images from 1st page of the pdf:
  * lowres file will be jpeg of 1st pdf page with max size = 220x220 pixels. w/h ratio of pdf page is preserved of course.
  * thumb file will be jpeg of 1st pdf page with max size = 48x48 pixels. w/h ratio of pdf page is preserved of course.


Could you help me to solve this ?
0
Comment
Question by:pelnisgproup
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
10 Comments
 
LVL 17

Expert Comment

by:Dushan De Silva
ID: 39673680
0
 

Author Comment

by:pelnisgproup
ID: 39673688
@Dushan911 I am not asking for tool.I am asking about perl scrtip or python.

Thanks
0
 
LVL 29

Expert Comment

by:pepr
ID: 39673819
Dushan is right. You need something faster than pure Python/Perl implementation of the task.

I suggest to use the matured ImageMagick (http://www.imagemagick.org/) set of utilities. The one you want is named convert. You can use the PythonMagick (http://www.imagemagick.org/download/python/) wrapper.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:pelnisgproup
ID: 39673839
Yes, just for information , we need

1)script will be called from an GET HTTP call ( check a specs above).
2) we not working on window platform.


and the tool suggested by @Dushan911 is .

Free PDF to Image Converter works on Windows XP, Windows Vista, Windows 7 and Windows 8, both 32-bit and 64-bit versions.

Thanks
0
 
LVL 29

Expert Comment

by:pepr
ID: 39673864
OK. Use the ImageMagick. It is implemented for your platform. Here is how the conversion can be done:
import PythonMagick

img = PythonMagick.Image('a.pdf')    # use your full path here
img.write('original.jpg')

imgLowRes = PythonMagick.Image(img)  # make a copy
imgLowRes.resize('220x')
imgLowRes.write('lowRes.jpg')

img.resize('48x')
img.write('thumb.jpg')

Open in new window

0
 

Author Comment

by:pelnisgproup
ID: 39673880
@pepr

Why the produce images are in black.?

Thanks
0
 
LVL 29

Expert Comment

by:pepr
ID: 39673886
Try to use img.sample(...) instead of the resize.

Also, img = PythonMagick.Image('a.pdf[0]') -- notice the zero in the brackets--will extract the first page.
0
 

Author Comment

by:pelnisgproup
ID: 39673922
Hello pepr,

- script will produces 2 jpg images from 1st page of the pdf:
  * lowres file will be jpeg of 1st pdf page with max size = 220x220 pixels. w/h ratio of pdf page is preserved of course.
  * thumb file will be jpeg of 1st pdf page with max size = 48x48 pixels. w/h ratio of pdf page is preserved of course.

Could you improve that

#!c:/Python27/python.exe -u
import PythonMagick
img = PythonMagick.Image('Hello.pdf')    # use your full path here
img.write('original.jpeg')

imgLowRes = PythonMagick.Image(img)  # make a copy
imgLowRes.sample('220x')
imgLowRes.write('lowRes.jpeg')

img.sample('48x')
img.write('thumb.jpeg')

Open in new window


From that script still produces images in bad quality.

Any idea?

Thanks
0
 
LVL 29

Accepted Solution

by:
pepr earned 500 total points
ID: 39673965
For the bad quality first. (I did not notice.) PDF is basically a vector format. This way the image object needs to render it to pixels first, It probably uses some default (rather low) resolution. Try to create the empty image first, set the wanted density (DPI as string), and only then load the PDF file into:
import PythonMagick

pdfname = 'a.pdf'             # use your full path here

img = PythonMagick.Image()    # empty object first
img.density('300')            # set the density for reading (DPI); must be as a string
img.read(pdfname + '[0]')     # read the first page of PDF
img.write('original.jpg')

imgLowRes = PythonMagick.Image(img)  # make a copy
imgLowRes.sample('x220')
imgLowRes.write('lowRes.jpg')

Open in new window

If I understand the first part of the question... Do you know whether the PDF files are always potraits? I.e., should always the height be converted to 220?

Warning: The bigger density you use, the longer it takes to render the PDF. Good compromise should be found.
0
 
LVL 29

Expert Comment

by:pepr
ID: 39674325
Did you solve all the problems? You may like more:
#!python3

import os
import PythonMagick


def smallAndThumbnail(fname, smallpath, thumbnailpath):

    name, ext = os.path.splitext(os.path.basename(fname))
    smallfname = os.path.join(smallpath, name + '.jpg')
    thumbfname = os.path.join(thumbnailpath, name + '.jpg')
 
    img = PythonMagick.Image()    # empty object first
    img.density('300')            # set the density for readeing; must be as a string
    img.read(fname + '[0]')       # read the first page of PDF

    # Background color.
    bgcolor = 'white'             # can have the form '#ffffff'
    
    # The small image first.
    geometry = '220x220'
    imgResult = PythonMagick.Image(geometry, bgcolor)  # init -- the square of the colour
    imgCopy = PythonMagick.Image(img)                  # make a copy
    imgCopy.sample(geometry)
    imgResult.composite(imgCopy, 
                        PythonMagick.GravityType.CenterGravity, 
                        PythonMagick.CompositeOperator.CopyCompositeOp)
    imgResult.write(smallfname)

    # The thumbnail.
    geometry = '48x48'
    imgResult = PythonMagick.Image(geometry, bgcolor)  # init -- the square of the colour
    imgCopy = PythonMagick.Image(img)                  # make a copy
    imgCopy.sample(geometry)
    imgResult.composite(imgCopy, 
                        PythonMagick.GravityType.CenterGravity, 
                        PythonMagick.CompositeOperator.CopyCompositeOp)
    imgResult.write(thumbfname)


if __name__ == '__main__':

    smallAndThumbnail('./pdf/a.pdf', './small', './thumbnail')

Open in new window

0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Strings in Python are the set of characters that, once defined, cannot be changed by any other method like replace. Even if we use the replace method it still does not modify the original string that we use, but just copies the string and then modif…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question