Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Combining existing PDFs using PDF::Reuse, preserving filled out forms?

Posted on 2008-10-14
8
Medium Priority
?
1,281 Views
Last Modified: 2011-10-19
Hi Experts,

Have a little project I am having trouble with -
The idea is, we have PDF forms that are filled out automatically by our system on certain events, which are then delivered to a regulating body to process (sent by email/fax/post etc. depending on their requirements).

Anyway, submissions may involve multiple PDF documents (and sometimes JPEGS), and now some recipients are requiring the documents to be combined into a single PDF document so they can open one file from their email & print it.

I thought it would be pretty simple, but I've run into one problem I can't seem to overcome.

Since the PDFs created previously contained form fields which were filled out using prField (perl module PDF::Reuse), when I pull in those documents into the new PDF using prSinglePage (so I can stamp some text on each page to identify that each page belongs to the greater good), the field values are not preserved (they revert to their default value / or blank).

I can forsee another problem where the documents that are pulled together share field names and that will obviously cause issues!

Does anyone know of a way to import an existing PDF, while preserving all the information in the form fields ... or, a way to flatten the fields so they become plain text on the PDF or something??

My last hope was shattered when I tried to use prForm to pull in each page but apparently it doesn't do what I thought it was going to do (I find the documentation for PDF::Reuse pretty unclear unfortunately).

It's more important to be able to combine the documents than it s necessary to stamp stuff on every page, so if the entire PDF has to be imported in one hit (as opposed to prSinglePage), so be it!

Everything works except for the field values coming across into the new PDF, so it's the last hurdle. All help is appreciated greatly.

Tx,
Glauron

# Now combine the other documents where possible
for  ( @documents )
{
	# What type of document is it?
	my $type = lc $_->getSet('type');
	if ( $type eq 'pdf' )
	{
		# Is a PDF so go through each page (every valid PDF has at least 1 page)
		# and add it to the combined PDF, with the stamp if necessary
		my $sourcePdf = $_->retrieveFileRef();
		my $pagesLeft = 1;
		while ( $pagesLeft )
		{
			_prStamp( $details ) if ( $details->{stampText} );
			$pagesLeft = prSinglePage( $sourcePdf );
		}
	}
	elsif ( $type eq 'jpg' || $type eq 'jpeg' )
	{
		# Ignore since we've already handled them
	}
	else
	{
		# Anything we can't handle goes into the remainders array
		push @remainder, $_;
	}
}

Open in new window

0
Comment
Question by:Glauron
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 1

Author Comment

by:Glauron
ID: 22718140
With prForm, I had hoped it would be able to load in individual pages from a PDF, and using the "print" effect, just plonk it onto the current page (from doco: "'print', which is default, loads the page in an internal table, adds it to the document and prints it to the current page"). My hope was if it was encapsulated in an internal table, it would basically have its own namespace or something so no field names would conflict ....

When I try the code below (in place of above), I get the error:
Fatal error when combining documents:  xxx/files/7580.pdf_1 can't be used as a form. See the documentation under prForm how to concatenate streams. This is the first PDF that actually includes form elements. PDFs are pulled in & combined before this one successfully; no worries.

If I set 'tolerant' to a true value, it doesn't croak, but the resulting "combined" pdf is missing all the PDFs containing fields.

(not worried about pulling in all the pages until I get the important stuff working)

Tx.


# Now combine the other documents where possible
for  ( @documents )
{
	# What type of document is it?
	my $type = lc $_->getSet('type');
	if ( $type eq 'pdf' )
	{
		prForm(
			{
				file => $_->retrieveFileRef(),
				page => 1,
				#adjust => 1,
				effect => 'print',
				tolerant => 0,
			}
		);
		prPage();
	}
	elsif ( $type eq 'jpg' || $type eq 'jpeg' )
	{
		# Ignore since we've already handled them
	}
	else
	{
		# Anything we can't handle goes into the remainders array
		push @remainder, $_;
	}
}

Open in new window

0
 
LVL 39

Expert Comment

by:Adam314
ID: 22723212
Can you attach 2 PDFs that need to be combined that have forms on them.
0
 
LVL 1

Author Comment

by:Glauron
ID: 22727377
The two forms would then have clashing field names, eg both forms would have a field called "name", which should be filled out with different values but if I pull in both forms, the fields are in the same "namespace" (?). Even if I managed to get the field values in, if the first PDF imported sets the "name" field to "Joe Bloe", as soon as I import the second form and set the "name" field to "Mary Jane", the "name" field on the first imported PDF would change to "Mary Jane" (in theory - )

And correcting myself above, I can get the "form" PDFs loaded into the larger PDF, but no matter what I try, I can't get the field values loaded into it.

I've now taken the route to use ghostscript to convert the PDF documents to JPGs, and import them instead!! Pretty drastic, but we need to get it working :S

However, the resulting JPGs are also only showing the default values!!!!
For the life of me, I can't find ANY resources explaining how to fix that, or even other people with the same problems.

I've been through the Ghostscript documentation & it gives no options or advice.

How does a PDF fill out those fields? Does it use javascript to populate them after the PDF is opened? If so, I can understand why that might be a hassle. Is there maybe a way to flatten fields so they become normal text fields that could be rendered properly?

Any ideas!? :(

Thanks

gs -dDOPDFMARKS -dNOPAUSE -dNOINTERPOLATE -sDEVICE=jpeggray -r144x144 -sOutputFile=./123456.jpg ./123456.pdf -c quit

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 39

Expert Comment

by:Adam314
ID: 22727839
I don't know off the top of my head, as I haven't done this before.  This is why I asked you to post 2 of the PDFs - so I could experiment.
If you post 2 of the PDFs, I'll try some things - maybe I'll figure something out, maybe not.  But without them, I have nothing with which to even try.
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 22728193
Haven't tried it, but you may want to take a look at the pdf toolkit (pdftk):

  http://www.pdfhacks.com/pdftk/

It can flatten forms which is what you're after.
0
 
LVL 1

Author Comment

by:Glauron
ID: 22728463
Aah sorry Adam I misunderstood ur question! I've attached some sample documents.
There are 2 templates, which are filled out programmatically to result in the 2 matching documents.
What I'm trying to do effectively is combine the 2 final documents together, preceded by the coversheet.

What ends up happening are the field values (eg. "Joe Bloe" on Template1) reverts back to the default value ("t1name"). You can muck around with those to your heart's content. In the end we should have a single 4 page PDF containing the lot.

(Hmm tested combining these & some fields worked, others were blanked completely - must have something to do with how the fields are set using PDF::Reuse)

Clock - looks great! I'll play around with that tomorrow.

Coversheet.pdf
Document1.pdf
Document2.pdf
Template1.pdf
Template2.pdf
0
 
LVL 1

Accepted Solution

by:
Glauron earned 0 total points
ID: 22980990
We ended up building an external util in Java with the iText library which flattens PDF files preserving all the form values etc. which works quite well, and then proceeded to combine afterwards. The pdftoolkit sadly didn't work - I think it was the way PDF::Reuse filled out form fields with javascript. Would have preferred to use pdf toolkit - a tried and tested method.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22981006
Glad you got it working.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question