Solved

how to save a pdf file

Posted on 2009-05-14
41
2,832 Views
Last Modified: 2012-05-07
I have the code shown below. Seems simple enough. I'm opening a pdf document, then saving it elsewhere. However, when I do this I cannot open the new document. I get "A drawing error occured" in adobe reader. The original file has 3 pages and the new drawing-error file has 3 blank pages.

Surely this is something people have programmed billions of time over the years. What's my problem?
String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf";

String outputFile = "/usr/local/apache/htdocs/new.pdf";

 

PDDocument doc = PDDocument.load( inputFile );

out.print(doc.getNumberOfPages() );

doc.save( outputFile );

doc.close();

Open in new window

0
Comment
Question by:jmarkfoley
  • 18
  • 16
  • 6
  • +1
41 Comments
 
LVL 8

Expert Comment

by:ozlevanon
Comment Utility
I'm not familiar with the PDF API you're using, but it seems it somehow messes the file. If all you want is to copy the file I recommend you simply do a byte-by-byte copy (see attached code sample). Of course, if you need something more, it won't do.
copyFile(new File("/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf"), new File("/usr/local/apache/htdocs/new.pdf"));
 
 
 

 public static void copyFile(File inputFile, File outputFile) throws IOException

	{

		InputStream fis = null;

		OutputStream fos = null;
 

		try

		{

			fis = new FileInputStream(inputFile);

			fos = new FileOutputStream(outputFile);

			pipe(fis, fos);

		}

		finally

		{

			if (fis != null)

            {

                fis.close();

            }

            if (fos != null)

            {

			    fos.close();

            }

		}

	}
 

    public static void pipe(InputStream is, OutputStream os) throws IOException

	{

		byte[] buffer = new byte[8192];

		int len;

		do

		{

			len = is.read(buffer, 0, 8192);

			if (len > 0)

			{

				os.write(buffer, 0, len);

			}

		}

		while (len >= 0);

	}

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Are you *certain* no exceptions occur when your'e saving or closing?
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
ozlevanon, I'm using the PDFbox package. Yes, I know how to simply copy a file. I am intending to muck with the pdf contents and write the results (see http://www.experts-exchange.com/Web_Development/Document_Imaging/Q_24303092.html), and I got this same error. My code shown is the result of stripping things out little by little to identify the fundamental error.

CEHJ, I get no tomcat exception and I've not seen anything in the mod_jk.log and nothing fishy in $CATALINA_HOME/logs. I had a try/catch around the save(), but it caught nothing, so I took it out for simplicity in my posting. It does create the new.pdf file with size 53122 which is a bit smaller than the original: 54353. The first few characters of the file are "%PDF-1.5", just like the original. Does code like I've shown work for you? If you want to examine the actual output it can be found in http://www.fluxrunner.com/new.pdf. Adobe can save it and everything. A copy of the original is there too as old.pdf.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Are you *certain* no exceptions occur when your'e saving or closing?
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>Does code like I've shown work for you?

I'll try it
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
It works for me. Does it correctly print the number of pages?
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Both old.pdf and new.pdf are valid files of 3 pages from what i can see ...
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
It does correctly print the number of pages.

Are you able to open the output file with Acrobat Reader? I've tried opening http://www.fluxrunner.com/new.pdf on a couple of different computers and at best I get 3 blank pages.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>Are you able to open the output file with Acrobat Reader?

I'm not using Acrobat at the moment, but i can read it fine
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
> I'm not using Acrobat at the moment, but i can read it fine

1. What are you using to read it? Do you have access to Acrobat? If so, could you try it? This is going to have to be readable by Acrobat since that's what joe-average has.

2. Are you reading the file from my link or the file you created with your program? If the latter, could you post that pdf in the file section of your response and I'll download it and check it out.

3. What version of PDFbox are you using? Mine appears to be 0.7.3; leastwise, that's the name on the jarfile: PDFBox-0.7.3.jar
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>1. What are you using to read it?
xpdf
Do you have access to Acrobat?
Not right at the moment

2. Are you reading the file from my link or the file you created with your program?
Both. Both work (is there much point in attaching?)

3. What version of PDFbox are you using?
The same, oldish one. You might try another API perhaps


0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
I'm stumped. I'm  open to suggestions. What do you mean by "another API"?

If you don't mind, yes, go ahead and attach the output of your program. I'd like to see if acrobat can read it AND it like to compare it with mine.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Another pdf api such as iText


Here's one i copied using 'your' code:
x11.pdf
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
code is fine, looks like pdfbox is creating a pdf that is incompatible with your version of acrobat. Perhaps try updating pdfbox to the latest version if you haven't already.

0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
This is getting frustrating. I went to http://www.pdfbox.org. Apparently 0.7.3 is the latest version of PDFBox from October 2006. I re-downloaded anyway and I did an md5sum on my jar and the new jar. They match. So, I downloaded the latest vesion of Acrobat (9.1) to my XP notebook and tried opening my new.pdf with that. With the 9.1 version I got the error message: "Insufficient data for an image"

I don't get it. Why can you guys open my pdf fine, but I can't?  What else could be wrong? Some other library or jar? This is a fairly recent linux build, so I should have the latest of everything.

I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

Could there be some PDF settings for encryption or drawing that are defaulted differently on my setup than yours?

I'm out of ideas and I have to get this program working for a trade show at the end of the month. Help!
root@webhost1:/usr/local/apache/htdocs# fd -p new.pdf

     0: 25 50 44 46 2D 31 2E 35 0A 25 F6 E4 FC DF 0A 31    %PDF-1.5.%.....1

    10: 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 4C 61 6E 67 20     0 obj.<<./Lang

    20: 28 78 2D 64 65 66 61 75 6C 74 29 0A 2F 50 61 67    (x-default)./Pag

    30: 65 73 20 32 20 30 20 52 0A 2F 54 79 70 65 20 2F    es 2 0 R./Type /

    40: 43 61 74 61 6C 6F 67 0A 3E 3E 0A 65 6E 64 6F 62    Catalog.>>.endob

    50: 6A 0A 33 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 43 72    j.3 0 obj.<<./Cr

    60: 65 61 74 69 6F 6E 44 61 74 65 20 28 44 3A 32 30    eationDate (D:20

    70: 30 37 30 35 31 31 31 38 32 36 31 39 2B 30 30 27    070511182619+00'

    80: 30 30 27 29 0A 2F 43 72 65 61 74 6F 72 20 28 50    00')./Creator (P

    90: 61 70 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 2F    aperPort 11.0)./

    A0: 4D 6F 64 44 61 74 65 20 28 44 3A 32 30 30 37 30    ModDate (D:20070

    B0: 35 31 31 31 38 32 36 31 39 2B 30 30 27 30 30 27    511182619+00'00'

    C0: 29 0A 2F 50 72 6F 64 75 63 65 72 20 28 50 61 70    )./Producer (Pap

    D0: 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 3E 3E 0A    erPort 11.0).>>.

    E0: 65 6E 64 6F 62 6A 0A 32 20 30 20 6F 62 6A 0A 3C    endobj.2 0 obj.<

    F0: 3C 0A 2F 43 6F 75 6E 74 20 33 0A 2F 4B 69 64 73    <./Count 3./Kids

? q

root@webhost1:/usr/local/apache/htdocs# fd -p old.pdf

     0: 25 50 44 46 2D 31 2E 35 0D 0A 25 F1 F9 F7 F6 33    %PDF-1.5..%....3

    10: 2E 33 0D 0A 34 20 30 20 6F 62 6A 0D 0A 3C 3C 0D    .3..4 0 obj..<<.

    20: 0A 2F 42 69 74 73 50 65 72 43 6F 6D 70 6F 6E 65    ./BitsPerCompone

    30: 6E 74 20 31 20 0D 0A 2F 43 6F 6C 6F 72 53 70 61    nt 1 ../ColorSpa

    40: 63 65 20 2F 44 65 76 69 63 65 47 72 61 79 20 0D    ce /DeviceGray .

    50: 0A 2F 46 69 6C 74 65 72 20 2F 4A 42 49 47 32 44    ./Filter /JBIG2D

    60: 65 63 6F 64 65 20 0D 0A 2F 48 65 69 67 68 74 20    ecode ../Height

    70: 32 31 39 39 20 0D 0A 2F 4C 65 6E 67 74 68 20 35    2199 ../Length 5

    80: 20 30 20 52 20 0D 0A 2F 4E 61 6D 65 20 2F 69 6D     0 R ../Name /im

    90: 61 67 65 30 20 0D 0A 2F 53 75 62 74 79 70 65 20    age0 ../Subtype

    A0: 2F 49 6D 61 67 65 20 0D 0A 2F 54 79 70 65 20 2F    /Image ../Type /

    B0: 58 4F 62 6A 65 63 74 20 0D 0A 2F 57 69 64 74 68    XObject ../Width

    C0: 20 31 37 30 30 20 0D 0A 3E 3E 0D 0A 73 74 72 65     1700 ..>>..stre

    D0: 61 6D 0D 0A 00 00 00 00 30 00 01 00 00 00 13 00    am......0.......

    E0: 00 06 A4 00 00 08 97 00 00 00 C8 00 00 00 C8 01    ................

    F0: 00 00 00 00 00 01 00 01 01 00 00 37 AD 08 00 02    ...........7....

?

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

That could be down to version differences. Having said that, good reader software should be able to cope with different versions
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
If you're downloading from linux to windows over say ftp, make sure that you're doing that in binary format or it will corrupt the file. md5sum it at both ends - the sums should be identical
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
One would think Acrobat could cope with version differences.

I'm trying to access the file in two ways. I create the file in the apache htdocs directory so I can get to it via the web using http://www.fluxrunner.com/new.pdf. IE will open that file in Acrobat. Also, I scp'd it to my local linux host with my Windows workstation samba mounting it AND to my windows workstation directly. md5sum confirms that the new.pdf created on fluxrunner is the same as the one I've downloaded locally.

I tried again using a different source pdf; only one page. Same thing.  "Insufficient data for an image"

Are you *sure* you've actually opened MY pdf and it worked OK?
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
Here are the imports I'm using. Am I missing a critical one for proper saving?
<%@ page import="java.lang.String,java.io.*,java.util.*" %>

<%@ page import="java.util.Date,java.text.SimpleDateFormat,

  java.lang.StringBuffer,java.text.FieldPosition" %>

<%@ page import="java.lang.Object,

  org.pdfbox.exceptions.COSVisitorException,

  org.pdfbox.io.RandomAccessFile,

  org.pdfbox.pdmodel.PDDocument,

  org.pdfbox.pdmodel.PDPage,

  org.pdfbox.pdmodel.edit.PDPageContentStream,

  org.pdfbox.pdmodel.graphics.xobject.PDCcitt,

  org.pdfbox.pdmodel.graphics.xobject.PDJpeg,

  org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage" %>

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
I've also stripped out of the last import all but java.lang.Object and org.pdfbox.pdmodel.PDDocument, same error :(
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>Here are the imports I'm using

That looks fine. You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
> Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?

Yes, does it open OK for you?
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Opens fine for me
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
> Opens fine for me

Amazing! Are you opening it with Acrobat or something else? It needs to open with acrobat since most users will be accessing from their PC's. I also tried this URL from a completely different windows laptop, same problem.

> You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?

No, no exceptions.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>Amazing! Are you opening it with Acrobat or something else?

xpdf still

You said you had more luck with the browser. Does it open with that?
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
> What else could be wrong? Some other library or jar?

As I mentioned earlier your code is fine. The pdf you are creating is fine.
The incompatibility seems to be with your version of acrobat and pdfbox, is strange though.

have you tried opening it on a different box?
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
> You said you had more luck with the browser. Does it open with that?

No, because it launches Acrobat, so same thing. I don't have another windows based reader. If you can get a hold of a workstation running acrobat reader, I'd be curious what your results are. I suspect you won't be able to read it there either. I've tried Acrobat on a couple of different computers.

Meanwhile I'm going to investigate another api. If PDFBox generates pdf's that only work with xpdf or such-like then it's of little use to me. Adobe is the inventor of pdf and if an api doesn't work with Adobe's reader there's something wrong. Nobody's jumped into this topic saying they've had no problem reading PDFBox pdf's in Acrobat. I'm beginning to think PDFBox is a dead product anyway. The latest (and only) release appears to be 0.7.3 from October 2006. So no one's working on it. The http://incubator.apache.org/pdfbox/download.html site says "No releases of Apache PDFBox are yet available" and refers you back to the 0.7.3 release (besides, the "0" first digit says "beta" to me).

You mentioned iText. Does that have, or can I build a jar using it? Do you have another recommendation? I think I'll try something like that today. I don't know what else I can try with PDFBox.
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
> have you tried opening it on a different box?

Yes, as mentioned in my previous message, and both version 6 and version 9 of Acrobat.
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
Comment Utility
It's conceivable that the file format has moved on since anyone last did any serious work on PDFBox. Try iText, which in fact is the most well-known api
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
> I'm beginning to think PDFBox is a dead product anyway.

it is


what are your requirements?

0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
Requirements: I am creating a website for attorneys to file documents online. The document must be timestamped (so I have to add text and/or images) and the stamped document saved as a pdf file. These pdf's must be accessible by attorneys and the court, virtually all of whom will be accessing from their office computers running windows and acrobat.
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
oh yeah, and it is being demo'd at a show in two weeks!
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
hang on while I boot a windows box and try it here

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
If you have too much difficulty, just treat it as an image, timestamp it and save it as a jpg and have done with it
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
same problem here, if you want to chase I'd be looking at any images in the pdf (eg. try on a pdf without images)
if you want to look at itext then the following are good resources
http://itextdocs.lowagie.com/tutorial/general/webapp/index.php
http://javaboutique.internet.com/tutorials/iText/
http://www.geek-tutorials.com/java/itext/itext_index.php
0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
Thanks for the feedback.
 I'm getting ready to install itext. I'll be back when I have some info.

Those pdf's I've been testing with are simple court filings, no images. If iText works, no sense chasing. If not, then I've got problems.
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
they are actually all images

0
 
LVL 1

Author Comment

by:jmarkfoley
Comment Utility
Yeah!!!! Finally!!!! It works with iText. Now I can move on to trying to samp the document (which is another posting!)

Thanks for your time and patience.
<%@ page import="java.lang.Object,

  com.lowagie.text.pdf.PdfReader,

  com.lowagie.text.pdf.PdfStamper" %>

<%
 

String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/2-20.pdf";

String outputFile = "/usr/local/apache/htdocs/new.pdf";
 

PdfReader doc = new PdfReader(inputFile);

PdfStamper stamp = new PdfStamper(doc,new FileOutputStream(outputFile));

stamp.close();

doc.close();

%>

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
ok - glad you're making progress
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
:-)
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
powerN  challenge 3 46
Running Jira on Raspberry PI 2? 3 114
thymeleaf natural templating vs JSP 2 21
Java Jpanels and Jframe 8 18
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
PaperPort has a feature called the "Send To Bar". It provides a convenient, drag-and-drop interface for using other installed software, such as Microsoft Office. However, this article shows that the latest Office 2016 apps (installed with an Office …
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now