Solved

Extracting PDF title and displaying it in HTML via ASP

Posted on 2003-12-02
12
1,075 Views
Last Modified: 2012-05-04
I wrote up some ASP code that reads all pdf files in a directory and displays it by its filename.

In HTML (via ASP) I can display the files by filename, but want to display it by its PDF Title name. So basically, I need to know a way to extract the PDF Title from the pdf file and display it in HTML.
0
Comment
Question by:tobiason
  • 6
  • 4
  • 2
12 Comments
 
LVL 4

Expert Comment

by:ainapure
ID: 9860539
you will need some kind of component and/or tool to extract title from the pdf file. Try searching for it on google.

-amit
0
 
LVL 4

Expert Comment

by:mikosha
ID: 9860540
I've got some idea and may be it will work:
If you open any pdf file as a ascii file ,you'll something like this at the top:

%PDF-1.4
%âãÏÓ
1 0 obj
<<
/Producer (Acrobat Distiller Command 3.01 for Solaris 2.3 and later \(SPARC\))
/Creator (FrameMaker 5.5.6.)
/ModDate (D:20031202132023-05'00')
/CreationDate (D:19960530152336Z)
/Title (title)
>>

Actual title is in brackets (for this example title is "title").So ,if the pdf file has a title so it will be at the same position and you can find this place (either by serching for "/Title" keyword or going to exact line) and read a title and even more pdf info about this file (everything that is inside << >>).

Hope it will work.
cheers:)
 
0
 
LVL 4

Expert Comment

by:ainapure
ID: 9860565
initial comments.

1)You have to open PDF file and try to read the metadata
2) Store and display the extracted metadata

Dont exactly know how you would go about it at this time. I am sure there should be some component to do this.

-amit
0
 

Author Comment

by:tobiason
ID: 9860890
mikosha,
that is a big clue. then the question would be finding a way to open each pdf file as a .txt file and searching for the "/Title" keyword and taking the string inside its bracket.

that's a lot more coding to write up. does anyone have a simplier way to pull the title in (hopefully, with a library line of code. ;) unless, if you've done coding like mikosha mentions, i could use assistance in that.

amit,
what kind of component would you suggest i need or search for?

0
 
LVL 4

Expert Comment

by:mikosha
ID: 9861024
ok,it was just an idea :)
If you're considering to use third party components i think it will be much easy to emplement. But it costs :)
Actualy open any file as a text is not so big deal using OpenTextFile Method of FileSystemObject (you will get a TextStream object as a result) and then store all the text in a string variable using ReadAll Method of TextStrem Object. After that to find a position by keyword you can use instr() vbscript function.
Thats all folks (about 5-10 line of code). And you'll use only built-in object of IIS .
But decission is yours, i just wanted to show that the clue is not so big :)

cheers:)

 
0
 

Author Comment

by:tobiason
ID: 9861168
This tactic is working great so far.
However, I am not sure how to pull the data from within the <pdf:Title> bracket.
To clearify, so far I can open the pdf file and locate the bracket. But what do I do to find the Title that's in between the brackets (ex. <pdf:Title>PDF Title Goes Here</pdf:Title>)

Here is what I have:

PDFpath = "Current/" & PDFfilename

const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath(PDFpath))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

strSearchThis = objTS.Read(objFile.Size)

if instr(strSearchThis, "<pdf:Title>") > 0 then
    Response.Write "Found Title Bracket!"
end if
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 4

Expert Comment

by:mikosha
ID: 9861268
I think after this you have to make a search for "</pdf:Title>" keyword .
Let say A is the first position(from instr(strSearchThis, "<pdf:Title>") )
and B is the secon one (from instr(strSearchThis, "</pdf:Title>")).
So your title will be between A+11 (which is len("<pdf:Title>") ) and B.
I think the final thing will be something like this :

current_title = Mid(strSearchThis, A+11, B)
0
 
LVL 4

Expert Comment

by:mikosha
ID: 9861288
And if it works ,you can proudly call this "KindOfLittleXMLparser" :)
(By the way , you could use XML parser to retrieve this title too)
0
 
LVL 4

Accepted Solution

by:
mikosha earned 500 total points
ID: 9861308
Sorry ,i have a mistake in a final thing. It should be this way:

current_title = Mid(strSearchThis, A+11, B-A+11)
0
 

Author Comment

by:tobiason
ID: 9861461
It works...However, it comes out with all addition crap from the pdf (ascii read). It seems like there needs to be some tweaking involved with the instr() stuff.

You can see for yourself, this is what I have:

PDFpath = "Current/" & PDFfilename
const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath(PDFpath))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

strSearchThis = objTS.Read(objFile.Size)

if instr(strSearchThis, "<pdf:Title>") > 0 then
  TitleStart = instr(strSearchThis, "<pdf:Title>")
  TitleEnd = instr(strSearchThis, "</pdf:Title>")
  Title = Mid(strSearchThis, TitleStart+11, TitleEnd)
end if
                  
Response.Write ("<tr><td valign='top'><a href=Current/" & PDFfilename & ">" & Title & "</a></td></tr>")      

ALMOST THERE!!! This is kinda cool, by the way, just like XML parsing!
Paul
PS: I'm leaving work in ten minutes, will be back tomorrow, and will credit ya 500 points when this is complete. Thanks!!!
0
 

Author Comment

by:tobiason
ID: 9861526
NEVERMIND! Your correction fixed it!
You should get your 500 points!
Thanks a bunch for this. It seems that this is a common issue that's been unsolved regarding my searches via google.com

Cheers!!!
0
 
LVL 4

Expert Comment

by:mikosha
ID: 9861578
Thanx :)
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

I have helped a lot of people on EE with their coding sources and have enjoyed near about every minute of it. Sometimes it can get a little tedious but it is always a challenge and the one thing that I always say is:  The Exchange of information …
This demonstration started out as a follow up to some recently posted questions on the subject of logging in: http://www.experts-exchange.com/Programming/Languages/Scripting/JavaScript/Q_28634665.html and http://www.experts-exchange.com/Programming/…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now