Solved

How to create a glossary from many files

Posted on 2014-02-21
3
341 Views
Last Modified: 2014-02-27
I have a collection of many different PDF files (material from a Six Sigma class). The PDF files have selectable text, meaning that they're not flat image PDFs.

I'm looking for some way to automatically create a content-searchable structure for these files.

For example, let's say I need to find everywhere that a specific word or phrase existed within the material. It would be nice to have a method to search for this word, then output a list of PDFs that contain the word (including the page number within the PDF file). As an extra, having the results automatically linked to the PDF file and page would be awesome.

Does anyone know if this is possible? And if so, how?

Thanks in advance.
0
Comment
Question by:isaacr25
3 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 39878344
What programming language are you using?
0
 
LVL 29

Accepted Solution

by:
Olaf Doschke earned 500 total points
ID: 39878376
There are tools concatenating PDFs (after which you could search the single PDF), but that's perhaps not applicable to a large PDF collection.

If you're on Windows, Adobe offers an iFilter for PDF, meaning you can index the PDFs with Windows indexing. That's working better than it's reputation.

http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

Bye, Olaf.
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 39883332
Hi isaacr25,
It's possible, but I'm not aware of an off-the-shelf app that does it, so I think it would take some custom development. I did something with a lot of the functionality you're looking for (but not all of it) as a by-product of this EE thread:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

I'm not sure if I can extend that solution into one for you...that would take some study...but before I even spend the time to study it, please contact me via the email in my EE profile if you're interested. It's beyond the scope of answering in an EE question, imo. I say this because the program mentioned above has been expanded to include subfolders and create a CSV file and is now 670 lines of code...and the Windows installer for it is an additional 115 lines of code...well beyond volunteer labor. :)  Regards, Joe
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Boost your ability to deliver ambitious and competitive web apps by choosing the right JavaScript framework to best suit your project’s needs.
In this article, you will read about the trends across the human resources departments for the upcoming year. Some of them include improving employee experience, adopting new technologies, using HR software to its full extent, and integrating artifi…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question