Extracting PDF title and displaying it in HTML via ASP

I wrote up some ASP code that reads all pdf files in a directory and displays it by its filename.

In HTML (via ASP) I can display the files by filename, but want to display it by its PDF Title name. So basically, I need to know a way to extract the PDF Title from the pdf file and display it in HTML.
tobiasonAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ainapureCommented:
you will need some kind of component and/or tool to extract title from the pdf file. Try searching for it on google.

-amit
0
mikoshaCommented:
I've got some idea and may be it will work:
If you open any pdf file as a ascii file ,you'll something like this at the top:

%PDF-1.4
%âãÏÓ
1 0 obj
<< 
/Producer (Acrobat Distiller Command 3.01 for Solaris 2.3 and later \(SPARC\))
/Creator (FrameMaker 5.5.6.)
/ModDate (D:20031202132023-05'00')
/CreationDate (D:19960530152336Z)
/Title (title)
>> 

Actual title is in brackets (for this example title is "title").So ,if the pdf file has a title so it will be at the same position and you can find this place (either by serching for "/Title" keyword or going to exact line) and read a title and even more pdf info about this file (everything that is inside << >>).

Hope it will work.
cheers:)
 
0
ainapureCommented:
initial comments.

1)You have to open PDF file and try to read the metadata
2) Store and display the extracted metadata

Dont exactly know how you would go about it at this time. I am sure there should be some component to do this.

-amit
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

tobiasonAuthor Commented:
mikosha,
that is a big clue. then the question would be finding a way to open each pdf file as a .txt file and searching for the "/Title" keyword and taking the string inside its bracket.

that's a lot more coding to write up. does anyone have a simplier way to pull the title in (hopefully, with a library line of code. ;) unless, if you've done coding like mikosha mentions, i could use assistance in that.

amit,
what kind of component would you suggest i need or search for?

0
mikoshaCommented:
ok,it was just an idea :)
If you're considering to use third party components i think it will be much easy to emplement. But it costs :)
Actualy open any file as a text is not so big deal using OpenTextFile Method of FileSystemObject (you will get a TextStream object as a result) and then store all the text in a string variable using ReadAll Method of TextStrem Object. After that to find a position by keyword you can use instr() vbscript function.
Thats all folks (about 5-10 line of code). And you'll use only built-in object of IIS .
But decission is yours, i just wanted to show that the clue is not so big :)

cheers:)

 
0
tobiasonAuthor Commented:
This tactic is working great so far.
However, I am not sure how to pull the data from within the <pdf:Title> bracket.
To clearify, so far I can open the pdf file and locate the bracket. But what do I do to find the Title that's in between the brackets (ex. <pdf:Title>PDF Title Goes Here</pdf:Title>)

Here is what I have:

PDFpath = "Current/" & PDFfilename

const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath(PDFpath))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

strSearchThis = objTS.Read(objFile.Size)

if instr(strSearchThis, "<pdf:Title>") > 0 then
    Response.Write "Found Title Bracket!"
end if
0
mikoshaCommented:
I think after this you have to make a search for "</pdf:Title>" keyword .
Let say A is the first position(from instr(strSearchThis, "<pdf:Title>") )
and B is the secon one (from instr(strSearchThis, "</pdf:Title>")).
So your title will be between A+11 (which is len("<pdf:Title>") ) and B.
I think the final thing will be something like this :

current_title = Mid(strSearchThis, A+11, B)
0
mikoshaCommented:
And if it works ,you can proudly call this "KindOfLittleXMLparser" :)
(By the way , you could use XML parser to retrieve this title too)
0
mikoshaCommented:
Sorry ,i have a mistake in a final thing. It should be this way:

current_title = Mid(strSearchThis, A+11, B-A+11)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
tobiasonAuthor Commented:
It works...However, it comes out with all addition crap from the pdf (ascii read). It seems like there needs to be some tweaking involved with the instr() stuff.

You can see for yourself, this is what I have:

PDFpath = "Current/" & PDFfilename
const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath(PDFpath))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

strSearchThis = objTS.Read(objFile.Size)

if instr(strSearchThis, "<pdf:Title>") > 0 then
  TitleStart = instr(strSearchThis, "<pdf:Title>")
  TitleEnd = instr(strSearchThis, "</pdf:Title>")
  Title = Mid(strSearchThis, TitleStart+11, TitleEnd)
end if
                  
Response.Write ("<tr><td valign='top'><a href=Current/" & PDFfilename & ">" & Title & "</a></td></tr>")      

ALMOST THERE!!! This is kinda cool, by the way, just like XML parsing!
Paul
PS: I'm leaving work in ten minutes, will be back tomorrow, and will credit ya 500 points when this is complete. Thanks!!!
0
tobiasonAuthor Commented:
NEVERMIND! Your correction fixed it!
You should get your 500 points!
Thanks a bunch for this. It seems that this is a common issue that's been unsolved regarding my searches via google.com

Cheers!!!
0
mikoshaCommented:
Thanx :)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
ASP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.