Solved

how to save a pdf file

Posted on 2009-05-14
41
2,843 Views
Last Modified: 2012-05-07
I have the code shown below. Seems simple enough. I'm opening a pdf document, then saving it elsewhere. However, when I do this I cannot open the new document. I get "A drawing error occured" in adobe reader. The original file has 3 pages and the new drawing-error file has 3 blank pages.

Surely this is something people have programmed billions of time over the years. What's my problem?
String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf";
String outputFile = "/usr/local/apache/htdocs/new.pdf";
 
PDDocument doc = PDDocument.load( inputFile );
out.print(doc.getNumberOfPages() );
doc.save( outputFile );
doc.close();

Open in new window

0
Comment
Question by:jmarkfoley
  • 18
  • 16
  • 6
  • +1
41 Comments
 
LVL 8

Expert Comment

by:ozlevanon
ID: 24385348
I'm not familiar with the PDF API you're using, but it seems it somehow messes the file. If all you want is to copy the file I recommend you simply do a byte-by-byte copy (see attached code sample). Of course, if you need something more, it won't do.
copyFile(new File("/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf"), new File("/usr/local/apache/htdocs/new.pdf"));
 
 
 
 public static void copyFile(File inputFile, File outputFile) throws IOException
	{
		InputStream fis = null;
		OutputStream fos = null;
 
		try
		{
			fis = new FileInputStream(inputFile);
			fos = new FileOutputStream(outputFile);
			pipe(fis, fos);
		}
		finally
		{
			if (fis != null)
            {
                fis.close();
            }
            if (fos != null)
            {
			    fos.close();
            }
		}
	}
 
    public static void pipe(InputStream is, OutputStream os) throws IOException
	{
		byte[] buffer = new byte[8192];
		int len;
		do
		{
			len = is.read(buffer, 0, 8192);
			if (len > 0)
			{
				os.write(buffer, 0, len);
			}
		}
		while (len >= 0);
	}

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24385505
Are you *certain* no exceptions occur when your'e saving or closing?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24386152
ozlevanon, I'm using the PDFbox package. Yes, I know how to simply copy a file. I am intending to muck with the pdf contents and write the results (see http://www.experts-exchange.com/Web_Development/Document_Imaging/Q_24303092.html), and I got this same error. My code shown is the result of stripping things out little by little to identify the fundamental error.

CEHJ, I get no tomcat exception and I've not seen anything in the mod_jk.log and nothing fishy in $CATALINA_HOME/logs. I had a try/catch around the save(), but it caught nothing, so I took it out for simplicity in my posting. It does create the new.pdf file with size 53122 which is a bit smaller than the original: 54353. The first few characters of the file are "%PDF-1.5", just like the original. Does code like I've shown work for you? If you want to examine the actual output it can be found in http://www.fluxrunner.com/new.pdf. Adobe can save it and everything. A copy of the original is there too as old.pdf.
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 86

Expert Comment

by:CEHJ
ID: 24386175
Are you *certain* no exceptions occur when your'e saving or closing?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24386205
>>Does code like I've shown work for you?

I'll try it
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24386815
It works for me. Does it correctly print the number of pages?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24386887
Both old.pdf and new.pdf are valid files of 3 pages from what i can see ...
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24388090
It does correctly print the number of pages.

Are you able to open the output file with Acrobat Reader? I've tried opening http://www.fluxrunner.com/new.pdf on a couple of different computers and at best I get 3 blank pages.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24389100
>>Are you able to open the output file with Acrobat Reader?

I'm not using Acrobat at the moment, but i can read it fine
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24389652
> I'm not using Acrobat at the moment, but i can read it fine

1. What are you using to read it? Do you have access to Acrobat? If so, could you try it? This is going to have to be readable by Acrobat since that's what joe-average has.

2. Are you reading the file from my link or the file you created with your program? If the latter, could you post that pdf in the file section of your response and I'll download it and check it out.

3. What version of PDFbox are you using? Mine appears to be 0.7.3; leastwise, that's the name on the jarfile: PDFBox-0.7.3.jar
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24389719
>>1. What are you using to read it?
xpdf
Do you have access to Acrobat?
Not right at the moment

2. Are you reading the file from my link or the file you created with your program?
Both. Both work (is there much point in attaching?)

3. What version of PDFbox are you using?
The same, oldish one. You might try another API perhaps


0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24389761
I'm stumped. I'm  open to suggestions. What do you mean by "another API"?

If you don't mind, yes, go ahead and attach the output of your program. I'd like to see if acrobat can read it AND it like to compare it with mine.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24389798
Another pdf api such as iText


Here's one i copied using 'your' code:
x11.pdf
0
 
LVL 92

Expert Comment

by:objects
ID: 24390821
code is fine, looks like pdfbox is creating a pdf that is incompatible with your version of acrobat. Perhaps try updating pdfbox to the latest version if you haven't already.

0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24392893
This is getting frustrating. I went to http://www.pdfbox.org. Apparently 0.7.3 is the latest version of PDFBox from October 2006. I re-downloaded anyway and I did an md5sum on my jar and the new jar. They match. So, I downloaded the latest vesion of Acrobat (9.1) to my XP notebook and tried opening my new.pdf with that. With the 9.1 version I got the error message: "Insufficient data for an image"

I don't get it. Why can you guys open my pdf fine, but I can't?  What else could be wrong? Some other library or jar? This is a fairly recent linux build, so I should have the latest of everything.

I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

Could there be some PDF settings for encryption or drawing that are defaulted differently on my setup than yours?

I'm out of ideas and I have to get this program working for a trade show at the end of the month. Help!
root@webhost1:/usr/local/apache/htdocs# fd -p new.pdf
     0: 25 50 44 46 2D 31 2E 35 0A 25 F6 E4 FC DF 0A 31    %PDF-1.5.%.....1
    10: 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 4C 61 6E 67 20     0 obj.<<./Lang
    20: 28 78 2D 64 65 66 61 75 6C 74 29 0A 2F 50 61 67    (x-default)./Pag
    30: 65 73 20 32 20 30 20 52 0A 2F 54 79 70 65 20 2F    es 2 0 R./Type /
    40: 43 61 74 61 6C 6F 67 0A 3E 3E 0A 65 6E 64 6F 62    Catalog.>>.endob
    50: 6A 0A 33 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 43 72    j.3 0 obj.<<./Cr
    60: 65 61 74 69 6F 6E 44 61 74 65 20 28 44 3A 32 30    eationDate (D:20
    70: 30 37 30 35 31 31 31 38 32 36 31 39 2B 30 30 27    070511182619+00'
    80: 30 30 27 29 0A 2F 43 72 65 61 74 6F 72 20 28 50    00')./Creator (P
    90: 61 70 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 2F    aperPort 11.0)./
    A0: 4D 6F 64 44 61 74 65 20 28 44 3A 32 30 30 37 30    ModDate (D:20070
    B0: 35 31 31 31 38 32 36 31 39 2B 30 30 27 30 30 27    511182619+00'00'
    C0: 29 0A 2F 50 72 6F 64 75 63 65 72 20 28 50 61 70    )./Producer (Pap
    D0: 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 3E 3E 0A    erPort 11.0).>>.
    E0: 65 6E 64 6F 62 6A 0A 32 20 30 20 6F 62 6A 0A 3C    endobj.2 0 obj.<
    F0: 3C 0A 2F 43 6F 75 6E 74 20 33 0A 2F 4B 69 64 73    <./Count 3./Kids
? q
root@webhost1:/usr/local/apache/htdocs# fd -p old.pdf
     0: 25 50 44 46 2D 31 2E 35 0D 0A 25 F1 F9 F7 F6 33    %PDF-1.5..%....3
    10: 2E 33 0D 0A 34 20 30 20 6F 62 6A 0D 0A 3C 3C 0D    .3..4 0 obj..<<.
    20: 0A 2F 42 69 74 73 50 65 72 43 6F 6D 70 6F 6E 65    ./BitsPerCompone
    30: 6E 74 20 31 20 0D 0A 2F 43 6F 6C 6F 72 53 70 61    nt 1 ../ColorSpa
    40: 63 65 20 2F 44 65 76 69 63 65 47 72 61 79 20 0D    ce /DeviceGray .
    50: 0A 2F 46 69 6C 74 65 72 20 2F 4A 42 49 47 32 44    ./Filter /JBIG2D
    60: 65 63 6F 64 65 20 0D 0A 2F 48 65 69 67 68 74 20    ecode ../Height
    70: 32 31 39 39 20 0D 0A 2F 4C 65 6E 67 74 68 20 35    2199 ../Length 5
    80: 20 30 20 52 20 0D 0A 2F 4E 61 6D 65 20 2F 69 6D     0 R ../Name /im
    90: 61 67 65 30 20 0D 0A 2F 53 75 62 74 79 70 65 20    age0 ../Subtype
    A0: 2F 49 6D 61 67 65 20 0D 0A 2F 54 79 70 65 20 2F    /Image ../Type /
    B0: 58 4F 62 6A 65 63 74 20 0D 0A 2F 57 69 64 74 68    XObject ../Width
    C0: 20 31 37 30 30 20 0D 0A 3E 3E 0D 0A 73 74 72 65     1700 ..>>..stre
    D0: 61 6D 0D 0A 00 00 00 00 30 00 01 00 00 00 13 00    am......0.......
    E0: 00 06 A4 00 00 08 97 00 00 00 C8 00 00 00 C8 01    ................
    F0: 00 00 00 00 00 01 00 01 01 00 00 37 AD 08 00 02    ...........7....
?

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24392901
>>I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

That could be down to version differences. Having said that, good reader software should be able to cope with different versions
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24392912
If you're downloading from linux to windows over say ftp, make sure that you're doing that in binary format or it will corrupt the file. md5sum it at both ends - the sums should be identical
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24392981
One would think Acrobat could cope with version differences.

I'm trying to access the file in two ways. I create the file in the apache htdocs directory so I can get to it via the web using http://www.fluxrunner.com/new.pdf. IE will open that file in Acrobat. Also, I scp'd it to my local linux host with my Windows workstation samba mounting it AND to my windows workstation directly. md5sum confirms that the new.pdf created on fluxrunner is the same as the one I've downloaded locally.

I tried again using a different source pdf; only one page. Same thing.  "Insufficient data for an image"

Are you *sure* you've actually opened MY pdf and it worked OK?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393019
Here are the imports I'm using. Am I missing a critical one for proper saving?
<%@ page import="java.lang.String,java.io.*,java.util.*" %>
<%@ page import="java.util.Date,java.text.SimpleDateFormat,
  java.lang.StringBuffer,java.text.FieldPosition" %>
<%@ page import="java.lang.Object,
  org.pdfbox.exceptions.COSVisitorException,
  org.pdfbox.io.RandomAccessFile,
  org.pdfbox.pdmodel.PDDocument,
  org.pdfbox.pdmodel.PDPage,
  org.pdfbox.pdmodel.edit.PDPageContentStream,
  org.pdfbox.pdmodel.graphics.xobject.PDCcitt,
  org.pdfbox.pdmodel.graphics.xobject.PDJpeg,
  org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage" %>

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24393021
Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393030
I've also stripped out of the last import all but java.lang.Object and org.pdfbox.pdmodel.PDDocument, same error :(
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24393032
>>Here are the imports I'm using

That looks fine. You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393035
> Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?

Yes, does it open OK for you?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24393055
Opens fine for me
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393091
> Opens fine for me

Amazing! Are you opening it with Acrobat or something else? It needs to open with acrobat since most users will be accessing from their PC's. I also tried this URL from a completely different windows laptop, same problem.

> You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?

No, no exceptions.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24393187
>>Amazing! Are you opening it with Acrobat or something else?

xpdf still

You said you had more luck with the browser. Does it open with that?
0
 
LVL 92

Expert Comment

by:objects
ID: 24393257
> What else could be wrong? Some other library or jar?

As I mentioned earlier your code is fine. The pdf you are creating is fine.
The incompatibility seems to be with your version of acrobat and pdfbox, is strange though.

have you tried opening it on a different box?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393348
> You said you had more luck with the browser. Does it open with that?

No, because it launches Acrobat, so same thing. I don't have another windows based reader. If you can get a hold of a workstation running acrobat reader, I'd be curious what your results are. I suspect you won't be able to read it there either. I've tried Acrobat on a couple of different computers.

Meanwhile I'm going to investigate another api. If PDFBox generates pdf's that only work with xpdf or such-like then it's of little use to me. Adobe is the inventor of pdf and if an api doesn't work with Adobe's reader there's something wrong. Nobody's jumped into this topic saying they've had no problem reading PDFBox pdf's in Acrobat. I'm beginning to think PDFBox is a dead product anyway. The latest (and only) release appears to be 0.7.3 from October 2006. So no one's working on it. The http://incubator.apache.org/pdfbox/download.html site says "No releases of Apache PDFBox are yet available" and refers you back to the 0.7.3 release (besides, the "0" first digit says "beta" to me).

You mentioned iText. Does that have, or can I build a jar using it? Do you have another recommendation? I think I'll try something like that today. I don't know what else I can try with PDFBox.
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393358
> have you tried opening it on a different box?

Yes, as mentioned in my previous message, and both version 6 and version 9 of Acrobat.
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 24393374
It's conceivable that the file format has moved on since anyone last did any serious work on PDFBox. Try iText, which in fact is the most well-known api
0
 
LVL 92

Expert Comment

by:objects
ID: 24393451
> I'm beginning to think PDFBox is a dead product anyway.

it is


what are your requirements?

0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393468
Requirements: I am creating a website for attorneys to file documents online. The document must be timestamped (so I have to add text and/or images) and the stamped document saved as a pdf file. These pdf's must be accessible by attorneys and the court, virtually all of whom will be accessing from their office computers running windows and acrobat.
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24393479
oh yeah, and it is being demo'd at a show in two weeks!
0
 
LVL 92

Expert Comment

by:objects
ID: 24393507
hang on while I boot a windows box and try it here

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24393597
If you have too much difficulty, just treat it as an image, timestamp it and save it as a jpg and have done with it
0
 
LVL 92

Expert Comment

by:objects
ID: 24393752
same problem here, if you want to chase I'd be looking at any images in the pdf (eg. try on a pdf without images)
if you want to look at itext then the following are good resources
http://itextdocs.lowagie.com/tutorial/general/webapp/index.php
http://javaboutique.internet.com/tutorials/iText/
http://www.geek-tutorials.com/java/itext/itext_index.php
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24397469
Thanks for the feedback.
 I'm getting ready to install itext. I'll be back when I have some info.

Those pdf's I've been testing with are simple court filings, no images. If iText works, no sense chasing. If not, then I've got problems.
0
 
LVL 92

Expert Comment

by:objects
ID: 24400235
they are actually all images

0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 24401991
Yeah!!!! Finally!!!! It works with iText. Now I can move on to trying to samp the document (which is another posting!)

Thanks for your time and patience.
<%@ page import="java.lang.Object,
  com.lowagie.text.pdf.PdfReader,
  com.lowagie.text.pdf.PdfStamper" %>
<%
 
String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/2-20.pdf";
String outputFile = "/usr/local/apache/htdocs/new.pdf";
 
PdfReader doc = new PdfReader(inputFile);
PdfStamper stamp = new PdfStamper(doc,new FileOutputStream(outputFile));
stamp.close();
doc.close();
%>

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24401995
ok - glad you're making progress
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24402004
:-)
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
tomcat not starting 6 45
hibernate example using maven 12 42
spring jars download 1 27
Chrome and Firefox Java 5 32
This article shows how to convert a multi-page PDF file into multiple image files, with one image file created for each page of the PDF. It does this by utilizing an excellent, free software package called GraphicsMagick. The solution is amazingly s…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question