<

Using open source tools to place annotation text as a stamp on a PDF document

Published on
16,628 Points
9,528 Views
1 Endorsement
Last Modified:
Approved
Purpose
To explain how to place a textual stamp on a PDF document.  This is commonly referred to as an annotation, or possibly a watermark, but a watermark is generally different in that it is somewhat translucent.  Watermark’s may be text or graphics, but this article focuses on annotating with text.

This concept can be used for a number of purposes, such as stamping scanned documents from a scanner, adding page numbers to PDF documents, applying serialized numbers to PDF documents, and much more.

Overview
I needed a way to stamp PDF files with a simple text string and initially only found expensive commercial solutions.  The solution I eventually worked out uses open source software and is very flexible, so I thought I’d share how I did it.  It can be implemented on a number of OS platforms and from a number of scripting/programming languages.  This solution is demonstrated using Windows and command-line tools.  The general process is to first generate a text stamp image, and then overlay it on one or more pages of a PDF.  It works great and is quick!

The following partial screenshot shows a small 12 point character annotation in the lower right corner of a PDF document.
 Partial screenshot to show annotation text
The Tools
For this solution, I use three open source tools: GhostScript, ImageMagick, and PDFTK

CAUTION:  ImageMagick does have many advanced capabilities, including being able to watermark or stamp PDF files directly, however, by itself it is EXTREMELY slow when manipulating PDF files directly, especially when there are multiple pages in the PDF.  However, it is able to very quickly modify or even create image files.  This is why I use a combination of ImageMagick and PDFTK to combine the best of both to create a stamp (with ImageMagick) and overlay it on a PDF (with PDFTK.)

For my example, I’m using the Windows binaries that are command-line-based.  Here’s how I prepared the environment:

First, you need some open source utilities:
1.      Start with GhostScript ( http://www.ghostscript.com/), but that is just the foundation.  You don’t work with GhostScript directly, you will be using it indirectly through the next utility.  At present, the current version of GhostScript is version 9.0.
2.      Then you need ImageMagick from http://www.imagemagick.org.  They have many variants depending on what scripting/programming language you want to use.  
3.      You also need PDFTK from http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/.  They also have many variants depending on your OS.  

Second, I recommend setting system level environment variables to make it easier to use these programs.  Open System Properties, select the “Advanced” tab, and click [Environment Variables].  In the “System variables” section, click [New…] to add the following environment variables with each pointing to the appropriate path where your installations of the utilities above are installed:

GS_PROG = C:\Program Files\GhostScript\gs9.0\bin\gswin32c.exe
MAGICK_HOME=C:\Program Files\ImageMagick
PDFTK_HOME=C:\ Program Files\PDFTK

The Process
Once you have those installed, tested and functional, then you can execute command line programs with arguments (in my case, from within my application, I created a DOS batch file with commands like the examples here and then I shell out from my application to execute the batch file.)

First, the creation of the watermark image

 
 {This is supposed to be one line, but split into multiple lines to make it easier to read}

convert.exe
          -colorspace RGB 
          -size 2550x3500 xc:transparent 
          -fill red
          -font Arial
          -pointsize 10
          -gravity SouthEast
          -annotate +100+30 “ This is an annotation ”
          stamp.pdf

Open in new window



Explanation of some of the options

Full details about the options can be found here (http://www.imagemagick.org/script/command-line-options.php), and for simplicity, the less obvious options I used are described below.

The size setting (2550x3500) is to generate an 8.5”x11” image to easily overlay on my 8.5”x11” PDF’s.   (This would have to be adjusted if you had a landscape PDF or other size PDF.)  And the option “xc:transparent “ is to make the resulting file transparent.  The watermark should match the size of your PDF to make placement easier, but it is not necessary.  There are alternatives if you need it, but you’d have more work to do when placing the image on the PDF file.   This selection also made the annotation option easy to use as part of automation as well

The “gravity” option is an easy one to use for placing the text based on the size of the image.  This works well with any sized image, and I chose to use this option specifically to simplify placement of the text.  You can use NorthWest (top left corner), North (top center), NorthEast (top right), etc.

The option “annotate” has two parameters: an offset and the actual text to be inserted.  The offset is in the format of +/-[horizontal ]+/-[vertical] (no spaces between the offset values, but there is a space after the offset values).  So in this example:
 -annotate  +100+30 “ This is an annotation ”

Open in new window

This will place text 100 pixels horizontally towards the center of the page (relative to the gravity selection) and 30 pixels vertically towards the center of the page (based on the gravity selection).

TIP: I have found that adding a space at the beginning and the end of the annotation text helps in readability, so I include these spaces all the time.

The last option is the output file name: stamp.pdf


Next, you need to overlay the watermark on the PDF document.  To do this, we use PDFTK.  I’ll demonstrate two methods: placing the stamp on only the first page (a multi-step process), and placing the stamp on all pages (a single-step process).

===Method 1: Placing the stamp on only the first page.  ===
With this method, there are three steps: Extract the first page of the source PDF, place the stamp on that first page, combine the stamped first page with the subsequent pages from the source PDF.

A. Extract the first page from the source PDF:

{This is supposed to be one line, but split into multiple lines to make it easier to read}

PDFTK.exe
          “SamplePDF.pdf”
          Cat 1
          output “FirstPage.pdf”

Open in new window


===Explanation of some of the options===
Full details about the options can be found here (http://www.pdflabs.com/docs/pdftk-man-page/).

{“SamplePDF.pdf”} is the input file which will have the stamp applied.  (Remember to use the full path if necessary.)

{Cat 1} tells PDFTK to only extract the first page.

{output “FirstPage.pdf” }tells PDFTK to save the output to the file FirstPage.pdf.  This will be used in the next step.

Notice the use of quotes to delineate the file names being used.  You especially need this if the path to the files may have spaces, so I recommend using them all the time just in case.

B. Apply the stamp to the first page from the source PDF:

{This is supposed to be one line, but split into multiple lines to make it easier to read}

PDFTK.exe 
         “FirstPage.pdf”
         stamp “stamp.pdf"
         output "FirstPageWithStamp.pdf”

Open in new window


{“FirstPage.pdf”} is the output from the previous step.

{stamp “stamp.pdf"} tells PDFTK to apply the stamp.pdf file from the ImageMagick results as an overlay on top of FirstPage.pdf.

{output "FirstPageWithStamp.pdf”} defines the output file to be generated.

Notice the use of quotes to delineate the file names being used.  You especially need this if the path to the files may have spaces, so I recommend using them all the time just in case.

C. Now, we need to combine the resulting “FirstPageWithStamp.pdf” with the remaining pages of the PDF, that is, if there is more than one page in the source PDF file.  If not, skip this step.  

{This is supposed to be one line, but split into multiple lines to make it easier to read}

PDFTK.exe 
          A="FirstPageWithStamp.pdf"
          B=”SamplePDF.pdf”
          cat A B2-end
          output “OutputFile.pdf”

Open in new window


{A="FirstPageWithStamp.pdf"} uses a concept employed by PDFTK to reference multiple files.  In this case, it assigns the letter A to the first file.  This would be the first page that now has the annotation text on it.

{B=”SamplePDF.pdf”} is the reference to the source file.  But the next option shows how we exclude the first page since that was already included.

{cat A B2-end} is actually interpreted as a command for PDFTK, and the command is to take the first file referenced by A (the entire file, which in this example is only one page), and conCATenate it with the second file, referenced by B, but to only take pages 2 through the end of the PDF document.  This way , you don’t have to know how many pages are actually in the document.

{output “OutputFile.pdf”} specifies the output file name (again, include the fill path, within quotes if necessary.)

That’s it!  The resulting OutputFile.PDF will contain all the original pages from the PDF with only the first page having the annotation text.


===Method 2: Annotating all pages===

Now, if you want to annotate all pages, you will still perform the ImageMagick step to create the stamp.pdf file, and then to apply it to all pages, you simply execute one PDFTK command.

{This is supposed to be one line, but split into multiple lines to make it easier to read}

 PDFTK.exe 
         “FirstPage.pdf”
         stamp “stamp.pdf"
         output " OutputFile.pdf”

Open in new window


In this case, PDFTK applies the stamp to all pages.

Examples

Attached to this article is a sample two page PDF document.  SamplePDF.pdf
The first demonstration will use Method 1 to apply a stamp to the first page only of this sample PDF.  Save the attached PDF to a folder called c:\test.  Assuming you have successfully installed the utilities described in the Tools section, and set your system variables as described, copy the following text into a new batch file in the C:\test folder, name it test.bat (be sure it doesn’t have a “.txt” extension if you use Notepad), and execute it.  You will end up with a new “OutputFile.pdf” that will have the annotation text only on the first page.  The original SamplePDF.pdf will remain unchanged.

C:
cd\test
start /wait %MAGICK_HOME%\convert.exe -colorspace RGB -size 2550x3500 xc:transparent -fill red -font Arial -pointsize 20 -gravity SouthEast -annotate +100+30 " This is an annotation " stamp.pdf
start /wait %PDFTK_HOME%\PDFTK.exe "SamplePDF.pdf" Cat 1 output "FirstPage.pdf"
start /wait %PDFTK_HOME%\PDFTK.exe "FirstPage.pdf" stamp "stamp.pdf" output "FirstPageWithStamp.pdf"
start /wait %PDFTK_HOME%\PDFTK.exe A="FirstPageWithStamp.pdf" B="SamplePDF.pdf" cat A B2-end output "OutputFile.pdf"

Open in new window


This second example follows Method 2, using the same SamplePDF.pdf file in c:\test, to apply the annotation to all pages and generate a new PDF file.  Create another batch file with the following commands.  The original SamplePDF.pdf will remain unchanged.

C:
cd\test
start /wait %MAGICK_HOME%\convert.exe -colorspace RGB -size 2550x3500 xc:transparent -fill red -font Arial -pointsize 20 -gravity SouthEast -annotate +100+30 " This is an annotation " stamp.pdf
start /wait %PDFTK_HOME%\PDFTK.exe "SamplePDF.pdf" stamp "stamp.pdf" output "AllPages.pdf"

Open in new window



Conclusion
This article explains how to use open source tools in a process to apply annotation text to a PDF file.  This can also be automated in a number of ways, either for end user client workstations, or server-side processing.  The possibilities are endless.
1
Comment
Author:kbirecki
0 Comments

Featured Post

Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

Join & Write a Comment

In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Progress

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month