how to save a pdf file

I have the code shown below. Seems simple enough. I'm opening a pdf document, then saving it elsewhere. However, when I do this I cannot open the new document. I get "A drawing error occured" in adobe reader. The original file has 3 pages and the new drawing-error file has 3 blank pages.

Surely this is something people have programmed billions of time over the years. What's my problem?
String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf";
String outputFile = "/usr/local/apache/htdocs/new.pdf";
 
PDDocument doc = PDDocument.load( inputFile );
out.print(doc.getNumberOfPages() );
doc.save( outputFile );
doc.close();

Open in new window

LVL 1
jmarkfoleyAsked:
Who is Participating?
 
CEHJConnect With a Mentor Commented:
It's conceivable that the file format has moved on since anyone last did any serious work on PDFBox. Try iText, which in fact is the most well-known api
0
 
ozlevanonCommented:
I'm not familiar with the PDF API you're using, but it seems it somehow messes the file. If all you want is to copy the file I recommend you simply do a byte-by-byte copy (see attached code sample). Of course, if you need something more, it won't do.
copyFile(new File("/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf"), new File("/usr/local/apache/htdocs/new.pdf"));
 
 
 
 public static void copyFile(File inputFile, File outputFile) throws IOException
	{
		InputStream fis = null;
		OutputStream fos = null;
 
		try
		{
			fis = new FileInputStream(inputFile);
			fos = new FileOutputStream(outputFile);
			pipe(fis, fos);
		}
		finally
		{
			if (fis != null)
            {
                fis.close();
            }
            if (fos != null)
            {
			    fos.close();
            }
		}
	}
 
    public static void pipe(InputStream is, OutputStream os) throws IOException
	{
		byte[] buffer = new byte[8192];
		int len;
		do
		{
			len = is.read(buffer, 0, 8192);
			if (len > 0)
			{
				os.write(buffer, 0, len);
			}
		}
		while (len >= 0);
	}

Open in new window

0
 
CEHJCommented:
Are you *certain* no exceptions occur when your'e saving or closing?
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
jmarkfoleyAuthor Commented:
ozlevanon, I'm using the PDFbox package. Yes, I know how to simply copy a file. I am intending to muck with the pdf contents and write the results (see http://www.experts-exchange.com/Web_Development/Document_Imaging/Q_24303092.html), and I got this same error. My code shown is the result of stripping things out little by little to identify the fundamental error.

CEHJ, I get no tomcat exception and I've not seen anything in the mod_jk.log and nothing fishy in $CATALINA_HOME/logs. I had a try/catch around the save(), but it caught nothing, so I took it out for simplicity in my posting. It does create the new.pdf file with size 53122 which is a bit smaller than the original: 54353. The first few characters of the file are "%PDF-1.5", just like the original. Does code like I've shown work for you? If you want to examine the actual output it can be found in http://www.fluxrunner.com/new.pdf. Adobe can save it and everything. A copy of the original is there too as old.pdf.
0
 
CEHJCommented:
Are you *certain* no exceptions occur when your'e saving or closing?
0
 
CEHJCommented:
>>Does code like I've shown work for you?

I'll try it
0
 
CEHJCommented:
It works for me. Does it correctly print the number of pages?
0
 
CEHJCommented:
Both old.pdf and new.pdf are valid files of 3 pages from what i can see ...
0
 
jmarkfoleyAuthor Commented:
It does correctly print the number of pages.

Are you able to open the output file with Acrobat Reader? I've tried opening http://www.fluxrunner.com/new.pdf on a couple of different computers and at best I get 3 blank pages.
0
 
CEHJCommented:
>>Are you able to open the output file with Acrobat Reader?

I'm not using Acrobat at the moment, but i can read it fine
0
 
jmarkfoleyAuthor Commented:
> I'm not using Acrobat at the moment, but i can read it fine

1. What are you using to read it? Do you have access to Acrobat? If so, could you try it? This is going to have to be readable by Acrobat since that's what joe-average has.

2. Are you reading the file from my link or the file you created with your program? If the latter, could you post that pdf in the file section of your response and I'll download it and check it out.

3. What version of PDFbox are you using? Mine appears to be 0.7.3; leastwise, that's the name on the jarfile: PDFBox-0.7.3.jar
0
 
CEHJCommented:
>>1. What are you using to read it?
xpdf
Do you have access to Acrobat?
Not right at the moment

2. Are you reading the file from my link or the file you created with your program?
Both. Both work (is there much point in attaching?)

3. What version of PDFbox are you using?
The same, oldish one. You might try another API perhaps


0
 
jmarkfoleyAuthor Commented:
I'm stumped. I'm  open to suggestions. What do you mean by "another API"?

If you don't mind, yes, go ahead and attach the output of your program. I'd like to see if acrobat can read it AND it like to compare it with mine.
0
 
CEHJCommented:
Another pdf api such as iText


Here's one i copied using 'your' code:
x11.pdf
0
 
objectsCommented:
code is fine, looks like pdfbox is creating a pdf that is incompatible with your version of acrobat. Perhaps try updating pdfbox to the latest version if you haven't already.

0
 
jmarkfoleyAuthor Commented:
This is getting frustrating. I went to http://www.pdfbox.org. Apparently 0.7.3 is the latest version of PDFBox from October 2006. I re-downloaded anyway and I did an md5sum on my jar and the new jar. They match. So, I downloaded the latest vesion of Acrobat (9.1) to my XP notebook and tried opening my new.pdf with that. With the 9.1 version I got the error message: "Insufficient data for an image"

I don't get it. Why can you guys open my pdf fine, but I can't?  What else could be wrong? Some other library or jar? This is a fairly recent linux build, so I should have the latest of everything.

I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

Could there be some PDF settings for encryption or drawing that are defaulted differently on my setup than yours?

I'm out of ideas and I have to get this program working for a trade show at the end of the month. Help!
root@webhost1:/usr/local/apache/htdocs# fd -p new.pdf
     0: 25 50 44 46 2D 31 2E 35 0A 25 F6 E4 FC DF 0A 31    %PDF-1.5.%.....1
    10: 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 4C 61 6E 67 20     0 obj.<<./Lang
    20: 28 78 2D 64 65 66 61 75 6C 74 29 0A 2F 50 61 67    (x-default)./Pag
    30: 65 73 20 32 20 30 20 52 0A 2F 54 79 70 65 20 2F    es 2 0 R./Type /
    40: 43 61 74 61 6C 6F 67 0A 3E 3E 0A 65 6E 64 6F 62    Catalog.>>.endob
    50: 6A 0A 33 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 43 72    j.3 0 obj.<<./Cr
    60: 65 61 74 69 6F 6E 44 61 74 65 20 28 44 3A 32 30    eationDate (D:20
    70: 30 37 30 35 31 31 31 38 32 36 31 39 2B 30 30 27    070511182619+00'
    80: 30 30 27 29 0A 2F 43 72 65 61 74 6F 72 20 28 50    00')./Creator (P
    90: 61 70 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 2F    aperPort 11.0)./
    A0: 4D 6F 64 44 61 74 65 20 28 44 3A 32 30 30 37 30    ModDate (D:20070
    B0: 35 31 31 31 38 32 36 31 39 2B 30 30 27 30 30 27    511182619+00'00'
    C0: 29 0A 2F 50 72 6F 64 75 63 65 72 20 28 50 61 70    )./Producer (Pap
    D0: 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 3E 3E 0A    erPort 11.0).>>.
    E0: 65 6E 64 6F 62 6A 0A 32 20 30 20 6F 62 6A 0A 3C    endobj.2 0 obj.<
    F0: 3C 0A 2F 43 6F 75 6E 74 20 33 0A 2F 4B 69 64 73    <./Count 3./Kids
? q
root@webhost1:/usr/local/apache/htdocs# fd -p old.pdf
     0: 25 50 44 46 2D 31 2E 35 0D 0A 25 F1 F9 F7 F6 33    %PDF-1.5..%....3
    10: 2E 33 0D 0A 34 20 30 20 6F 62 6A 0D 0A 3C 3C 0D    .3..4 0 obj..<<.
    20: 0A 2F 42 69 74 73 50 65 72 43 6F 6D 70 6F 6E 65    ./BitsPerCompone
    30: 6E 74 20 31 20 0D 0A 2F 43 6F 6C 6F 72 53 70 61    nt 1 ../ColorSpa
    40: 63 65 20 2F 44 65 76 69 63 65 47 72 61 79 20 0D    ce /DeviceGray .
    50: 0A 2F 46 69 6C 74 65 72 20 2F 4A 42 49 47 32 44    ./Filter /JBIG2D
    60: 65 63 6F 64 65 20 0D 0A 2F 48 65 69 67 68 74 20    ecode ../Height
    70: 32 31 39 39 20 0D 0A 2F 4C 65 6E 67 74 68 20 35    2199 ../Length 5
    80: 20 30 20 52 20 0D 0A 2F 4E 61 6D 65 20 2F 69 6D     0 R ../Name /im
    90: 61 67 65 30 20 0D 0A 2F 53 75 62 74 79 70 65 20    age0 ../Subtype
    A0: 2F 49 6D 61 67 65 20 0D 0A 2F 54 79 70 65 20 2F    /Image ../Type /
    B0: 58 4F 62 6A 65 63 74 20 0D 0A 2F 57 69 64 74 68    XObject ../Width
    C0: 20 31 37 30 30 20 0D 0A 3E 3E 0D 0A 73 74 72 65     1700 ..>>..stre
    D0: 61 6D 0D 0A 00 00 00 00 30 00 01 00 00 00 13 00    am......0.......
    E0: 00 06 A4 00 00 08 97 00 00 00 C8 00 00 00 C8 01    ................
    F0: 00 00 00 00 00 01 00 01 01 00 00 37 AD 08 00 02    ...........7....
?

Open in new window

0
 
CEHJCommented:
>>I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

That could be down to version differences. Having said that, good reader software should be able to cope with different versions
0
 
CEHJCommented:
If you're downloading from linux to windows over say ftp, make sure that you're doing that in binary format or it will corrupt the file. md5sum it at both ends - the sums should be identical
0
 
jmarkfoleyAuthor Commented:
One would think Acrobat could cope with version differences.

I'm trying to access the file in two ways. I create the file in the apache htdocs directory so I can get to it via the web using http://www.fluxrunner.com/new.pdf. IE will open that file in Acrobat. Also, I scp'd it to my local linux host with my Windows workstation samba mounting it AND to my windows workstation directly. md5sum confirms that the new.pdf created on fluxrunner is the same as the one I've downloaded locally.

I tried again using a different source pdf; only one page. Same thing.  "Insufficient data for an image"

Are you *sure* you've actually opened MY pdf and it worked OK?
0
 
jmarkfoleyAuthor Commented:
Here are the imports I'm using. Am I missing a critical one for proper saving?
<%@ page import="java.lang.String,java.io.*,java.util.*" %>
<%@ page import="java.util.Date,java.text.SimpleDateFormat,
  java.lang.StringBuffer,java.text.FieldPosition" %>
<%@ page import="java.lang.Object,
  org.pdfbox.exceptions.COSVisitorException,
  org.pdfbox.io.RandomAccessFile,
  org.pdfbox.pdmodel.PDDocument,
  org.pdfbox.pdmodel.PDPage,
  org.pdfbox.pdmodel.edit.PDPageContentStream,
  org.pdfbox.pdmodel.graphics.xobject.PDCcitt,
  org.pdfbox.pdmodel.graphics.xobject.PDJpeg,
  org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage" %>

Open in new window

0
 
CEHJCommented:
Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?
0
 
jmarkfoleyAuthor Commented:
I've also stripped out of the last import all but java.lang.Object and org.pdfbox.pdmodel.PDDocument, same error :(
0
 
CEHJCommented:
>>Here are the imports I'm using

That looks fine. You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?
0
 
jmarkfoleyAuthor Commented:
> Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?

Yes, does it open OK for you?
0
 
CEHJCommented:
Opens fine for me
0
 
jmarkfoleyAuthor Commented:
> Opens fine for me

Amazing! Are you opening it with Acrobat or something else? It needs to open with acrobat since most users will be accessing from their PC's. I also tried this URL from a completely different windows laptop, same problem.

> You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?

No, no exceptions.
0
 
CEHJCommented:
>>Amazing! Are you opening it with Acrobat or something else?

xpdf still

You said you had more luck with the browser. Does it open with that?
0
 
objectsCommented:
> What else could be wrong? Some other library or jar?

As I mentioned earlier your code is fine. The pdf you are creating is fine.
The incompatibility seems to be with your version of acrobat and pdfbox, is strange though.

have you tried opening it on a different box?
0
 
jmarkfoleyAuthor Commented:
> You said you had more luck with the browser. Does it open with that?

No, because it launches Acrobat, so same thing. I don't have another windows based reader. If you can get a hold of a workstation running acrobat reader, I'd be curious what your results are. I suspect you won't be able to read it there either. I've tried Acrobat on a couple of different computers.

Meanwhile I'm going to investigate another api. If PDFBox generates pdf's that only work with xpdf or such-like then it's of little use to me. Adobe is the inventor of pdf and if an api doesn't work with Adobe's reader there's something wrong. Nobody's jumped into this topic saying they've had no problem reading PDFBox pdf's in Acrobat. I'm beginning to think PDFBox is a dead product anyway. The latest (and only) release appears to be 0.7.3 from October 2006. So no one's working on it. The http://incubator.apache.org/pdfbox/download.html site says "No releases of Apache PDFBox are yet available" and refers you back to the 0.7.3 release (besides, the "0" first digit says "beta" to me).

You mentioned iText. Does that have, or can I build a jar using it? Do you have another recommendation? I think I'll try something like that today. I don't know what else I can try with PDFBox.
0
 
jmarkfoleyAuthor Commented:
> have you tried opening it on a different box?

Yes, as mentioned in my previous message, and both version 6 and version 9 of Acrobat.
0
 
objectsCommented:
> I'm beginning to think PDFBox is a dead product anyway.

it is


what are your requirements?

0
 
jmarkfoleyAuthor Commented:
Requirements: I am creating a website for attorneys to file documents online. The document must be timestamped (so I have to add text and/or images) and the stamped document saved as a pdf file. These pdf's must be accessible by attorneys and the court, virtually all of whom will be accessing from their office computers running windows and acrobat.
0
 
jmarkfoleyAuthor Commented:
oh yeah, and it is being demo'd at a show in two weeks!
0
 
objectsCommented:
hang on while I boot a windows box and try it here

0
 
CEHJCommented:
If you have too much difficulty, just treat it as an image, timestamp it and save it as a jpg and have done with it
0
 
objectsCommented:
same problem here, if you want to chase I'd be looking at any images in the pdf (eg. try on a pdf without images)
if you want to look at itext then the following are good resources
http://itextdocs.lowagie.com/tutorial/general/webapp/index.php
http://javaboutique.internet.com/tutorials/iText/
http://www.geek-tutorials.com/java/itext/itext_index.php
0
 
jmarkfoleyAuthor Commented:
Thanks for the feedback.
 I'm getting ready to install itext. I'll be back when I have some info.

Those pdf's I've been testing with are simple court filings, no images. If iText works, no sense chasing. If not, then I've got problems.
0
 
objectsCommented:
they are actually all images

0
 
jmarkfoleyAuthor Commented:
Yeah!!!! Finally!!!! It works with iText. Now I can move on to trying to samp the document (which is another posting!)

Thanks for your time and patience.
<%@ page import="java.lang.Object,
  com.lowagie.text.pdf.PdfReader,
  com.lowagie.text.pdf.PdfStamper" %>
<%
 
String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/2-20.pdf";
String outputFile = "/usr/local/apache/htdocs/new.pdf";
 
PdfReader doc = new PdfReader(inputFile);
PdfStamper stamp = new PdfStamper(doc,new FileOutputStream(outputFile));
stamp.close();
doc.close();
%>

Open in new window

0
 
CEHJCommented:
ok - glad you're making progress
0
 
CEHJCommented:
:-)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.