• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 401
  • Last Modified:

Compress pdf files through a print driver

I have a table in MS SQL with a binary column.  Over time people have uploaded files into it that are mostly pdf (with embedded images) and JPG's however the file sizes on some of them are huge.  I am assuming that is becuase they have scanned them at too high a resolution.  I am trying to develop a one size fits all solution to shrink the files.  I do not know the embedded image formats in the pdfs and JPG's will not compress, therefore compression tools will not work.

My Idea is to develop a script to parse through the 8000 files and use an api to print them at a standard 100dpi (something like cutepdf) to a pdf format and then re-import the resultant pdf image into the sql table.  Does anyone have any experience with this type of approach?  Is it a feasable approach or is there a better way to do this?
0
sibleypark
Asked:
sibleypark
  • 4
  • 4
  • 2
  • +1
1 Solution
 
HainKurtSr. System AnalystCommented:
there should be a free component print2pdf

check that one, you can print all of large files into pdf and reimport...
0
 
HainKurtSr. System AnalystCommented:
0
 
sibleyparkAuthor Commented:
Do you know if it has an api that I can reference in code?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Karl Heinz KremerCommented:
Re-frying (printing a PDF to a PDF printer) is usually a bad idea - you may change a lot more than just the resolution of the images in your PDF. PDFs can for example contain interactive form elements which would be lost when you print them. Also, sometimes these form elements are actually used for "real" PDF content (they are static and read-only) because it's easier to add a label with a button than with true PDF content. Depending on your printer driver, such elements may not show up correctly.

The best approach is to use a tool that is aware of the PDF structure and only changes your images by downsampling them and adding them again in the same position as the old ones. Adobe Acrobat can of course do that (for a hefty price tag), so can the Apago PDFEnhancer. I am unfortunately not aware of any free tool.

However, you can write your own if you know Java or C# by using the free iText (or iTextSharp) PDF library.
0
 
sibleyparkAuthor Commented:
I have used iText in the past through the API and this is an approach that may work.  I am just not sure how to automate the image resample for the 6000 pdfs.  The reason that I am looking at printing the pdf's is that I know that these are all just scanned files in a pdf format so there are no pdf properties that I would lose.  I will take a closer look at iText though.  Thanks for the idea.
0
 
Karl Heinz KremerCommented:
That's why I said "usually" :) If you know that you would only process image-only PDFs, re-printing them is certainly an option.

If you want to go the iText route, I would write a Java program that would interact with the database directly, and get one image at a time, process it, and store it back into the DB. Your question is tagged as ASP.NET, so I assume that C# would be more appropriate for you, but the idea would be the same.

As you may know, most free PDF printer drivers use Ghostscript as their backend. You can use Ghostscript directly to process your PDF files and downsample the documents directly, without going through a printer driver.
0
 
sibleyparkAuthor Commented:
Yes it lookslike by using Ghostscript i will be able to create a post script file and then convert it back to pdf.  Thanks for the help.
0
 
Karl Heinz KremerCommented:
YOu don't have to go to Postscript first, you can go directly to PDF with Ghostscript
0
 
sibleyparkAuthor Commented:
Do you have an example of the syntax?  I see that using the API you can use the following:
    Private Function ConvertFile() As Boolean
        Dim astrArgs(10) As String
        astrArgs(0) = "ps2pdf" 'The First Parameter is Ignored
        astrArgs(1) = "-dNOPAUSE"
        astrArgs(2) = "-dBATCH"
        astrArgs(3) = "-dSAFER"
        astrArgs(4) = "-r300"
        astrArgs(5) = "-sDEVICE=pdfwrite"
        astrArgs(6) = "-sOutputFile=c:\out.pdf"
        astrArgs(7) = "-c"
        astrArgs(8) = ".setpdfwrite"
        astrArgs(9) = "-f"
        astrArgs(10) = "c:\gs\gs7.04\examples\colorcir.ps"
        Return CallGS(astrArgs)
    End Function
to convert ps to pdf but I don't see a pdf - pdf option
0
 
Karl Heinz KremerCommented:
You don't have to do anything special - Ghostscript accepts PDF files as input (just like PostScript files). Just replace the input filename with the name of a PDF file.
0
 
Peter ByeRetiredCommented:
My thanks to khkremer for mentioning Apago PDF Enhancer. That led me to the Apago website where I also found PDF Shrink which looks like a great tool for reducing PDF files.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 4
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now