Solved

Combining existing PDFs using PDF::Reuse, preserving filled out forms?

Posted on 2008-10-14
8
1,248 Views
Last Modified: 2011-10-19
Hi Experts,

Have a little project I am having trouble with -
The idea is, we have PDF forms that are filled out automatically by our system on certain events, which are then delivered to a regulating body to process (sent by email/fax/post etc. depending on their requirements).

Anyway, submissions may involve multiple PDF documents (and sometimes JPEGS), and now some recipients are requiring the documents to be combined into a single PDF document so they can open one file from their email & print it.

I thought it would be pretty simple, but I've run into one problem I can't seem to overcome.

Since the PDFs created previously contained form fields which were filled out using prField (perl module PDF::Reuse), when I pull in those documents into the new PDF using prSinglePage (so I can stamp some text on each page to identify that each page belongs to the greater good), the field values are not preserved (they revert to their default value / or blank).

I can forsee another problem where the documents that are pulled together share field names and that will obviously cause issues!

Does anyone know of a way to import an existing PDF, while preserving all the information in the form fields ... or, a way to flatten the fields so they become plain text on the PDF or something??

My last hope was shattered when I tried to use prForm to pull in each page but apparently it doesn't do what I thought it was going to do (I find the documentation for PDF::Reuse pretty unclear unfortunately).

It's more important to be able to combine the documents than it s necessary to stamp stuff on every page, so if the entire PDF has to be imported in one hit (as opposed to prSinglePage), so be it!

Everything works except for the field values coming across into the new PDF, so it's the last hurdle. All help is appreciated greatly.

Tx,
Glauron

# Now combine the other documents where possible

for  ( @documents )

{

	# What type of document is it?

	my $type = lc $_->getSet('type');

	if ( $type eq 'pdf' )

	{

		# Is a PDF so go through each page (every valid PDF has at least 1 page)

		# and add it to the combined PDF, with the stamp if necessary

		my $sourcePdf = $_->retrieveFileRef();

		my $pagesLeft = 1;

		while ( $pagesLeft )

		{

			_prStamp( $details ) if ( $details->{stampText} );

			$pagesLeft = prSinglePage( $sourcePdf );

		}

	}

	elsif ( $type eq 'jpg' || $type eq 'jpeg' )

	{

		# Ignore since we've already handled them

	}

	else

	{

		# Anything we can't handle goes into the remainders array

		push @remainder, $_;

	}

}

Open in new window

0
Comment
Question by:Glauron
  • 4
  • 3
8 Comments
 
LVL 1

Author Comment

by:Glauron
ID: 22718140
With prForm, I had hoped it would be able to load in individual pages from a PDF, and using the "print" effect, just plonk it onto the current page (from doco: "'print', which is default, loads the page in an internal table, adds it to the document and prints it to the current page"). My hope was if it was encapsulated in an internal table, it would basically have its own namespace or something so no field names would conflict ....

When I try the code below (in place of above), I get the error:
Fatal error when combining documents:  xxx/files/7580.pdf_1 can't be used as a form. See the documentation under prForm how to concatenate streams. This is the first PDF that actually includes form elements. PDFs are pulled in & combined before this one successfully; no worries.

If I set 'tolerant' to a true value, it doesn't croak, but the resulting "combined" pdf is missing all the PDFs containing fields.

(not worried about pulling in all the pages until I get the important stuff working)

Tx.


# Now combine the other documents where possible

for  ( @documents )

{

	# What type of document is it?

	my $type = lc $_->getSet('type');

	if ( $type eq 'pdf' )

	{

		prForm(

			{

				file => $_->retrieveFileRef(),

				page => 1,

				#adjust => 1,

				effect => 'print',

				tolerant => 0,

			}

		);

		prPage();

	}

	elsif ( $type eq 'jpg' || $type eq 'jpeg' )

	{

		# Ignore since we've already handled them

	}

	else

	{

		# Anything we can't handle goes into the remainders array

		push @remainder, $_;

	}

}

Open in new window

0
 
LVL 39

Expert Comment

by:Adam314
ID: 22723212
Can you attach 2 PDFs that need to be combined that have forms on them.
0
 
LVL 1

Author Comment

by:Glauron
ID: 22727377
The two forms would then have clashing field names, eg both forms would have a field called "name", which should be filled out with different values but if I pull in both forms, the fields are in the same "namespace" (?). Even if I managed to get the field values in, if the first PDF imported sets the "name" field to "Joe Bloe", as soon as I import the second form and set the "name" field to "Mary Jane", the "name" field on the first imported PDF would change to "Mary Jane" (in theory - )

And correcting myself above, I can get the "form" PDFs loaded into the larger PDF, but no matter what I try, I can't get the field values loaded into it.

I've now taken the route to use ghostscript to convert the PDF documents to JPGs, and import them instead!! Pretty drastic, but we need to get it working :S

However, the resulting JPGs are also only showing the default values!!!!
For the life of me, I can't find ANY resources explaining how to fix that, or even other people with the same problems.

I've been through the Ghostscript documentation & it gives no options or advice.

How does a PDF fill out those fields? Does it use javascript to populate them after the PDF is opened? If so, I can understand why that might be a hassle. Is there maybe a way to flatten fields so they become normal text fields that could be rendered properly?

Any ideas!? :(

Thanks

gs -dDOPDFMARKS -dNOPAUSE -dNOINTERPOLATE -sDEVICE=jpeggray -r144x144 -sOutputFile=./123456.jpg ./123456.pdf -c quit

Open in new window

0
 
LVL 39

Expert Comment

by:Adam314
ID: 22727839
I don't know off the top of my head, as I haven't done this before.  This is why I asked you to post 2 of the PDFs - so I could experiment.
If you post 2 of the PDFs, I'll try some things - maybe I'll figure something out, maybe not.  But without them, I have nothing with which to even try.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 25

Expert Comment

by:clockwatcher
ID: 22728193
Haven't tried it, but you may want to take a look at the pdf toolkit (pdftk):

  http://www.pdfhacks.com/pdftk/

It can flatten forms which is what you're after.
0
 
LVL 1

Author Comment

by:Glauron
ID: 22728463
Aah sorry Adam I misunderstood ur question! I've attached some sample documents.
There are 2 templates, which are filled out programmatically to result in the 2 matching documents.
What I'm trying to do effectively is combine the 2 final documents together, preceded by the coversheet.

What ends up happening are the field values (eg. "Joe Bloe" on Template1) reverts back to the default value ("t1name"). You can muck around with those to your heart's content. In the end we should have a single 4 page PDF containing the lot.

(Hmm tested combining these & some fields worked, others were blanked completely - must have something to do with how the fields are set using PDF::Reuse)

Clock - looks great! I'll play around with that tomorrow.

Coversheet.pdf
Document1.pdf
Document2.pdf
Template1.pdf
Template2.pdf
0
 
LVL 1

Accepted Solution

by:
Glauron earned 0 total points
ID: 22980990
We ended up building an external util in Java with the iText library which flattens PDF files preserving all the form values etc. which works quite well, and then proceeded to combine afterwards. The pdftoolkit sadly didn't work - I think it was the way PDF::Reuse filled out form fields with javascript. Would have preferred to use pdf toolkit - a tried and tested method.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22981006
Glad you got it working.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now