Solved

How to create a glossary from many files

Posted on 2014-02-21
3
335 Views
Last Modified: 2014-02-27
I have a collection of many different PDF files (material from a Six Sigma class). The PDF files have selectable text, meaning that they're not flat image PDFs.

I'm looking for some way to automatically create a content-searchable structure for these files.

For example, let's say I need to find everywhere that a specific word or phrase existed within the material. It would be nice to have a method to search for this word, then output a list of PDFs that contain the word (including the page number within the PDF file). As an extra, having the results automatically linked to the PDF file and page would be awesome.

Does anyone know if this is possible? And if so, how?

Thanks in advance.
0
Comment
Question by:isaacr25
3 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 39878344
What programming language are you using?
0
 
LVL 29

Accepted Solution

by:
Olaf Doschke earned 500 total points
ID: 39878376
There are tools concatenating PDFs (after which you could search the single PDF), but that's perhaps not applicable to a large PDF collection.

If you're on Windows, Adobe offers an iFilter for PDF, meaning you can index the PDFs with Windows indexing. That's working better than it's reputation.

http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

Bye, Olaf.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39883332
Hi isaacr25,
It's possible, but I'm not aware of an off-the-shelf app that does it, so I think it would take some custom development. I did something with a lot of the functionality you're looking for (but not all of it) as a by-product of this EE thread:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

I'm not sure if I can extend that solution into one for you...that would take some study...but before I even spend the time to study it, please contact me via the email in my EE profile if you're interested. It's beyond the scope of answering in an EE question, imo. I say this because the program mentioned above has been expanded to include subfolders and create a CSV file and is now 670 lines of code...and the Windows installer for it is an additional 115 lines of code...well beyond volunteer labor. :)  Regards, Joe
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
XMind Plus helps organize all details/aspects of any project from large to small in an orderly and concise manner. If you are working on a complex project, use this micro tutorial to show you how to make a basic flow chart. The software is free when…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now