Link to home
Start Free TrialLog in
Avatar of Mark
Mark

asked on

how to save a pdf file

I have the code shown below. Seems simple enough. I'm opening a pdf document, then saving it elsewhere. However, when I do this I cannot open the new document. I get "A drawing error occured" in adobe reader. The original file has 3 pages and the new drawing-error file has 3 blank pages.

Surely this is something people have programmed billions of time over the years. What's my problem?
String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf";
String outputFile = "/usr/local/apache/htdocs/new.pdf";
 
PDDocument doc = PDDocument.load( inputFile );
out.print(doc.getNumberOfPages() );
doc.save( outputFile );
doc.close();

Open in new window

Avatar of ozlevanon
ozlevanon

I'm not familiar with the PDF API you're using, but it seems it somehow messes the file. If all you want is to copy the file I recommend you simply do a byte-by-byte copy (see attached code sample). Of course, if you need something more, it won't do.
copyFile(new File("/www/tomcat/webapps/courtscan/OH/demo/documents/2009/1-60.pdf"), new File("/usr/local/apache/htdocs/new.pdf"));
 
 
 
 public static void copyFile(File inputFile, File outputFile) throws IOException
	{
		InputStream fis = null;
		OutputStream fos = null;
 
		try
		{
			fis = new FileInputStream(inputFile);
			fos = new FileOutputStream(outputFile);
			pipe(fis, fos);
		}
		finally
		{
			if (fis != null)
            {
                fis.close();
            }
            if (fos != null)
            {
			    fos.close();
            }
		}
	}
 
    public static void pipe(InputStream is, OutputStream os) throws IOException
	{
		byte[] buffer = new byte[8192];
		int len;
		do
		{
			len = is.read(buffer, 0, 8192);
			if (len > 0)
			{
				os.write(buffer, 0, len);
			}
		}
		while (len >= 0);
	}

Open in new window

Avatar of CEHJ
Are you *certain* no exceptions occur when your'e saving or closing?
Avatar of Mark

ASKER

ozlevanon, I'm using the PDFbox package. Yes, I know how to simply copy a file. I am intending to muck with the pdf contents and write the results (see https://www.experts-exchange.com/questions/24303092/How-to-add-graphic-to-pdf-file.html), and I got this same error. My code shown is the result of stripping things out little by little to identify the fundamental error.

CEHJ, I get no tomcat exception and I've not seen anything in the mod_jk.log and nothing fishy in $CATALINA_HOME/logs. I had a try/catch around the save(), but it caught nothing, so I took it out for simplicity in my posting. It does create the new.pdf file with size 53122 which is a bit smaller than the original: 54353. The first few characters of the file are "%PDF-1.5", just like the original. Does code like I've shown work for you? If you want to examine the actual output it can be found in http://www.fluxrunner.com/new.pdf. Adobe can save it and everything. A copy of the original is there too as old.pdf.
Are you *certain* no exceptions occur when your'e saving or closing?
>>Does code like I've shown work for you?

I'll try it
It works for me. Does it correctly print the number of pages?
Both old.pdf and new.pdf are valid files of 3 pages from what i can see ...
Avatar of Mark

ASKER

It does correctly print the number of pages.

Are you able to open the output file with Acrobat Reader? I've tried opening http://www.fluxrunner.com/new.pdf on a couple of different computers and at best I get 3 blank pages.
>>Are you able to open the output file with Acrobat Reader?

I'm not using Acrobat at the moment, but i can read it fine
Avatar of Mark

ASKER

> I'm not using Acrobat at the moment, but i can read it fine

1. What are you using to read it? Do you have access to Acrobat? If so, could you try it? This is going to have to be readable by Acrobat since that's what joe-average has.

2. Are you reading the file from my link or the file you created with your program? If the latter, could you post that pdf in the file section of your response and I'll download it and check it out.

3. What version of PDFbox are you using? Mine appears to be 0.7.3; leastwise, that's the name on the jarfile: PDFBox-0.7.3.jar
>>1. What are you using to read it?
xpdf
Do you have access to Acrobat?
Not right at the moment

2. Are you reading the file from my link or the file you created with your program?
Both. Both work (is there much point in attaching?)

3. What version of PDFbox are you using?
The same, oldish one. You might try another API perhaps


Avatar of Mark

ASKER

I'm stumped. I'm  open to suggestions. What do you mean by "another API"?

If you don't mind, yes, go ahead and attach the output of your program. I'd like to see if acrobat can read it AND it like to compare it with mine.
Another pdf api such as iText


Here's one i copied using 'your' code:
x11.pdf
code is fine, looks like pdfbox is creating a pdf that is incompatible with your version of acrobat. Perhaps try updating pdfbox to the latest version if you haven't already.

Avatar of Mark

ASKER

This is getting frustrating. I went to http://www.pdfbox.org. Apparently 0.7.3 is the latest version of PDFBox from October 2006. I re-downloaded anyway and I did an md5sum on my jar and the new jar. They match. So, I downloaded the latest vesion of Acrobat (9.1) to my XP notebook and tried opening my new.pdf with that. With the 9.1 version I got the error message: "Insufficient data for an image"

I don't get it. Why can you guys open my pdf fine, but I can't?  What else could be wrong? Some other library or jar? This is a fairly recent linux build, so I should have the latest of everything.

I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

Could there be some PDF settings for encryption or drawing that are defaulted differently on my setup than yours?

I'm out of ideas and I have to get this program working for a trade show at the end of the month. Help!
root@webhost1:/usr/local/apache/htdocs# fd -p new.pdf
     0: 25 50 44 46 2D 31 2E 35 0A 25 F6 E4 FC DF 0A 31    %PDF-1.5.%.....1
    10: 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 4C 61 6E 67 20     0 obj.<<./Lang
    20: 28 78 2D 64 65 66 61 75 6C 74 29 0A 2F 50 61 67    (x-default)./Pag
    30: 65 73 20 32 20 30 20 52 0A 2F 54 79 70 65 20 2F    es 2 0 R./Type /
    40: 43 61 74 61 6C 6F 67 0A 3E 3E 0A 65 6E 64 6F 62    Catalog.>>.endob
    50: 6A 0A 33 20 30 20 6F 62 6A 0A 3C 3C 0A 2F 43 72    j.3 0 obj.<<./Cr
    60: 65 61 74 69 6F 6E 44 61 74 65 20 28 44 3A 32 30    eationDate (D:20
    70: 30 37 30 35 31 31 31 38 32 36 31 39 2B 30 30 27    070511182619+00'
    80: 30 30 27 29 0A 2F 43 72 65 61 74 6F 72 20 28 50    00')./Creator (P
    90: 61 70 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 2F    aperPort 11.0)./
    A0: 4D 6F 64 44 61 74 65 20 28 44 3A 32 30 30 37 30    ModDate (D:20070
    B0: 35 31 31 31 38 32 36 31 39 2B 30 30 27 30 30 27    511182619+00'00'
    C0: 29 0A 2F 50 72 6F 64 75 63 65 72 20 28 50 61 70    )./Producer (Pap
    D0: 65 72 50 6F 72 74 20 31 31 2E 30 29 0A 3E 3E 0A    erPort 11.0).>>.
    E0: 65 6E 64 6F 62 6A 0A 32 20 30 20 6F 62 6A 0A 3C    endobj.2 0 obj.<
    F0: 3C 0A 2F 43 6F 75 6E 74 20 33 0A 2F 4B 69 64 73    <./Count 3./Kids
? q
root@webhost1:/usr/local/apache/htdocs# fd -p old.pdf
     0: 25 50 44 46 2D 31 2E 35 0D 0A 25 F1 F9 F7 F6 33    %PDF-1.5..%....3
    10: 2E 33 0D 0A 34 20 30 20 6F 62 6A 0D 0A 3C 3C 0D    .3..4 0 obj..<<.
    20: 0A 2F 42 69 74 73 50 65 72 43 6F 6D 70 6F 6E 65    ./BitsPerCompone
    30: 6E 74 20 31 20 0D 0A 2F 43 6F 6C 6F 72 53 70 61    nt 1 ../ColorSpa
    40: 63 65 20 2F 44 65 76 69 63 65 47 72 61 79 20 0D    ce /DeviceGray .
    50: 0A 2F 46 69 6C 74 65 72 20 2F 4A 42 49 47 32 44    ./Filter /JBIG2D
    60: 65 63 6F 64 65 20 0D 0A 2F 48 65 69 67 68 74 20    ecode ../Height
    70: 32 31 39 39 20 0D 0A 2F 4C 65 6E 67 74 68 20 35    2199 ../Length 5
    80: 20 30 20 52 20 0D 0A 2F 4E 61 6D 65 20 2F 69 6D     0 R ../Name /im
    90: 61 67 65 30 20 0D 0A 2F 53 75 62 74 79 70 65 20    age0 ../Subtype
    A0: 2F 49 6D 61 67 65 20 0D 0A 2F 54 79 70 65 20 2F    /Image ../Type /
    B0: 58 4F 62 6A 65 63 74 20 0D 0A 2F 57 69 64 74 68    XObject ../Width
    C0: 20 31 37 30 30 20 0D 0A 3E 3E 0D 0A 73 74 72 65     1700 ..>>..stre
    D0: 61 6D 0D 0A 00 00 00 00 30 00 01 00 00 00 13 00    am......0.......
    E0: 00 06 A4 00 00 08 97 00 00 00 C8 00 00 00 C8 01    ................
    F0: 00 00 00 00 00 01 00 01 01 00 00 37 AD 08 00 02    ...........7....
?

Open in new window

>>I've attached a dump of the first block of the new and original pdf files. As you can see they are different.

That could be down to version differences. Having said that, good reader software should be able to cope with different versions
If you're downloading from linux to windows over say ftp, make sure that you're doing that in binary format or it will corrupt the file. md5sum it at both ends - the sums should be identical
Avatar of Mark

ASKER

One would think Acrobat could cope with version differences.

I'm trying to access the file in two ways. I create the file in the apache htdocs directory so I can get to it via the web using http://www.fluxrunner.com/new.pdf. IE will open that file in Acrobat. Also, I scp'd it to my local linux host with my Windows workstation samba mounting it AND to my windows workstation directly. md5sum confirms that the new.pdf created on fluxrunner is the same as the one I've downloaded locally.

I tried again using a different source pdf; only one page. Same thing.  "Insufficient data for an image"

Are you *sure* you've actually opened MY pdf and it worked OK?
Avatar of Mark

ASKER

Here are the imports I'm using. Am I missing a critical one for proper saving?
<%@ page import="java.lang.String,java.io.*,java.util.*" %>
<%@ page import="java.util.Date,java.text.SimpleDateFormat,
  java.lang.StringBuffer,java.text.FieldPosition" %>
<%@ page import="java.lang.Object,
  org.pdfbox.exceptions.COSVisitorException,
  org.pdfbox.io.RandomAccessFile,
  org.pdfbox.pdmodel.PDDocument,
  org.pdfbox.pdmodel.PDPage,
  org.pdfbox.pdmodel.edit.PDPageContentStream,
  org.pdfbox.pdmodel.graphics.xobject.PDCcitt,
  org.pdfbox.pdmodel.graphics.xobject.PDJpeg,
  org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage" %>

Open in new window

Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?
Avatar of Mark

ASKER

I've also stripped out of the last import all but java.lang.Object and org.pdfbox.pdmodel.PDDocument, same error :(
>>Here are the imports I'm using

That looks fine. You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?
Avatar of Mark

ASKER

> Well let's check again. It's this one is it not: http://www.fluxrunner.com/new.pdf ?

Yes, does it open OK for you?
Opens fine for me
Avatar of Mark

ASKER

> Opens fine for me

Amazing! Are you opening it with Acrobat or something else? It needs to open with acrobat since most users will be accessing from their PC's. I also tried this URL from a completely different windows laptop, same problem.

> You don't need the lang ones. If you were missing any you'd get exceptions - and you tell me you haven't got any ..?

No, no exceptions.
>>Amazing! Are you opening it with Acrobat or something else?

xpdf still

You said you had more luck with the browser. Does it open with that?
> What else could be wrong? Some other library or jar?

As I mentioned earlier your code is fine. The pdf you are creating is fine.
The incompatibility seems to be with your version of acrobat and pdfbox, is strange though.

have you tried opening it on a different box?
Avatar of Mark

ASKER

> You said you had more luck with the browser. Does it open with that?

No, because it launches Acrobat, so same thing. I don't have another windows based reader. If you can get a hold of a workstation running acrobat reader, I'd be curious what your results are. I suspect you won't be able to read it there either. I've tried Acrobat on a couple of different computers.

Meanwhile I'm going to investigate another api. If PDFBox generates pdf's that only work with xpdf or such-like then it's of little use to me. Adobe is the inventor of pdf and if an api doesn't work with Adobe's reader there's something wrong. Nobody's jumped into this topic saying they've had no problem reading PDFBox pdf's in Acrobat. I'm beginning to think PDFBox is a dead product anyway. The latest (and only) release appears to be 0.7.3 from October 2006. So no one's working on it. The http://incubator.apache.org/pdfbox/download.html site says "No releases of Apache PDFBox are yet available" and refers you back to the 0.7.3 release (besides, the "0" first digit says "beta" to me).

You mentioned iText. Does that have, or can I build a jar using it? Do you have another recommendation? I think I'll try something like that today. I don't know what else I can try with PDFBox.
Avatar of Mark

ASKER

> have you tried opening it on a different box?

Yes, as mentioned in my previous message, and both version 6 and version 9 of Acrobat.
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> I'm beginning to think PDFBox is a dead product anyway.

it is


what are your requirements?

Avatar of Mark

ASKER

Requirements: I am creating a website for attorneys to file documents online. The document must be timestamped (so I have to add text and/or images) and the stamped document saved as a pdf file. These pdf's must be accessible by attorneys and the court, virtually all of whom will be accessing from their office computers running windows and acrobat.
Avatar of Mark

ASKER

oh yeah, and it is being demo'd at a show in two weeks!
hang on while I boot a windows box and try it here

If you have too much difficulty, just treat it as an image, timestamp it and save it as a jpg and have done with it
same problem here, if you want to chase I'd be looking at any images in the pdf (eg. try on a pdf without images)
if you want to look at itext then the following are good resources
http://itextdocs.lowagie.com/tutorial/general/webapp/index.php
http://javaboutique.internet.com/tutorials/iText/
http://www.geek-tutorials.com/java/itext/itext_index.php
Avatar of Mark

ASKER

Thanks for the feedback.
 I'm getting ready to install itext. I'll be back when I have some info.

Those pdf's I've been testing with are simple court filings, no images. If iText works, no sense chasing. If not, then I've got problems.
they are actually all images

Avatar of Mark

ASKER

Yeah!!!! Finally!!!! It works with iText. Now I can move on to trying to samp the document (which is another posting!)

Thanks for your time and patience.
<%@ page import="java.lang.Object,
  com.lowagie.text.pdf.PdfReader,
  com.lowagie.text.pdf.PdfStamper" %>
<%
 
String inputFile = "/www/tomcat/webapps/courtscan/OH/demo/documents/2009/2-20.pdf";
String outputFile = "/usr/local/apache/htdocs/new.pdf";
 
PdfReader doc = new PdfReader(inputFile);
PdfStamper stamp = new PdfStamper(doc,new FileOutputStream(outputFile));
stamp.close();
doc.close();
%>

Open in new window

ok - glad you're making progress
:-)