Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 962
  • Last Modified:

Need a script to read pdf document info.

I currently have a folder on my Intranet server (IIS 4)containing many pdf files. Can anyone provide me with a "property reader" for pdf similar to the DSOLEFILE.DLL provided by Microsoft for use with MS Office documents.

http://msdn.microsoft.com/library/periodic/period00/fso.htm

Used in conjunction with the FileSystemObject you can trawl through folders, read each document's properties and generate html and hyperlinks on the fly.

Basically I want dumb users to drop their pdf's into a folder on the server, and have an asp or similar to read each file's properties and display Author, Subject and Title (with hyperlink). I've looked at adobe.com but can't find anything suitable.
0
devlinb
Asked:
devlinb
1 Solution
 
webwomanCommented:
You're not going to find anything on Adobe's site, because this has nothing to do with PDF -- and everything to do with the SERVER.

YOu need to set up a form for the user to upload their files. You won't have a whole lot of control over what or how they upload, though you certainly could write something that ran on the server and deleted anything that didn't meet your specs (for filetype/size).

That dll is specifically designed to work with IIS -- it's not going to work with Apache, or on a UNIX box. It works with MS stuff because MS wrote it.
0
 
raizonCommented:
I believe the point of the question was finding some way to read the properties of the PDF files dynamically to display the Author, Subject and Title of the file.

What I would do is

1.  In my upload form I would have text fields for the
Author, Subject and Title of the PDF.

2.  Create a DB with a table to hold that information and relate that table to anotherone that held the path to the file that was uploaded.

3.  When reading through the directory with the FileSystemObject query the DB to get the Information and build your page based off of that.

Raizon
0
 
coreytiCommented:
There is a Perl module that can take care of this stuff if you're able to use Perl for your project.

Checkout the PDF::Parse library at:
http://search.cpan.org/doc/ANTRO/PDF-111/PDF/Parse.pm

-corey
0
Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

 
devlinbAuthor Commented:
thanks coreyti,
This script works fine - except for PDF's which have any security built-in. When security is added to a PDF the document info is encrypted in some way and displays as garbage. Is there any way to get around this?
0
 
webwomanCommented:
Unlikely. It's got security because it's not supposed to be accessible.
0
 
devlinbAuthor Commented:
I disagree - the document information is still accessible in the reader even after a pdf has been secured. Why would Adobe want to make this inaccessible when all you want to do is prevent a pdf document from being modified?
0
 
devlinbAuthor Commented:
When security is added to a PDF, the document info is encrypted in some way and displays as garbage. Is there any way to get around
this?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Tackle projects and never again get stuck behind a technical roadblock.
Join Now