Link to home
Start Free TrialLog in
Avatar of sibleypark
sibleyparkFlag for Canada

asked on

Compress pdf files through a print driver

I have a table in MS SQL with a binary column.  Over time people have uploaded files into it that are mostly pdf (with embedded images) and JPG's however the file sizes on some of them are huge.  I am assuming that is becuase they have scanned them at too high a resolution.  I am trying to develop a one size fits all solution to shrink the files.  I do not know the embedded image formats in the pdfs and JPG's will not compress, therefore compression tools will not work.

My Idea is to develop a script to parse through the 8000 files and use an api to print them at a standard 100dpi (something like cutepdf) to a pdf format and then re-import the resultant pdf image into the sql table.  Does anyone have any experience with this type of approach?  Is it a feasable approach or is there a better way to do this?
Avatar of HainKurt
HainKurt
Flag of Canada image

there should be a free component print2pdf

check that one, you can print all of large files into pdf and reimport...
Avatar of sibleypark

ASKER

Do you know if it has an api that I can reference in code?
Re-frying (printing a PDF to a PDF printer) is usually a bad idea - you may change a lot more than just the resolution of the images in your PDF. PDFs can for example contain interactive form elements which would be lost when you print them. Also, sometimes these form elements are actually used for "real" PDF content (they are static and read-only) because it's easier to add a label with a button than with true PDF content. Depending on your printer driver, such elements may not show up correctly.

The best approach is to use a tool that is aware of the PDF structure and only changes your images by downsampling them and adding them again in the same position as the old ones. Adobe Acrobat can of course do that (for a hefty price tag), so can the Apago PDFEnhancer. I am unfortunately not aware of any free tool.

However, you can write your own if you know Java or C# by using the free iText (or iTextSharp) PDF library.
I have used iText in the past through the API and this is an approach that may work.  I am just not sure how to automate the image resample for the 6000 pdfs.  The reason that I am looking at printing the pdf's is that I know that these are all just scanned files in a pdf format so there are no pdf properties that I would lose.  I will take a closer look at iText though.  Thanks for the idea.
ASKER CERTIFIED SOLUTION
Avatar of Karl Heinz Kremer
Karl Heinz Kremer
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes it lookslike by using Ghostscript i will be able to create a post script file and then convert it back to pdf.  Thanks for the help.
YOu don't have to go to Postscript first, you can go directly to PDF with Ghostscript
Do you have an example of the syntax?  I see that using the API you can use the following:
    Private Function ConvertFile() As Boolean
        Dim astrArgs(10) As String
        astrArgs(0) = "ps2pdf" 'The First Parameter is Ignored
        astrArgs(1) = "-dNOPAUSE"
        astrArgs(2) = "-dBATCH"
        astrArgs(3) = "-dSAFER"
        astrArgs(4) = "-r300"
        astrArgs(5) = "-sDEVICE=pdfwrite"
        astrArgs(6) = "-sOutputFile=c:\out.pdf"
        astrArgs(7) = "-c"
        astrArgs(8) = ".setpdfwrite"
        astrArgs(9) = "-f"
        astrArgs(10) = "c:\gs\gs7.04\examples\colorcir.ps"
        Return CallGS(astrArgs)
    End Function
to convert ps to pdf but I don't see a pdf - pdf option
You don't have to do anything special - Ghostscript accepts PDF files as input (just like PostScript files). Just replace the input filename with the name of a PDF file.
My thanks to khkremer for mentioning Apago PDF Enhancer. That led me to the Apago website where I also found PDF Shrink which looks like a great tool for reducing PDF files.