?
Solved

How to create a glossary from many files

Posted on 2014-02-21
3
Medium Priority
?
346 Views
Last Modified: 2014-02-27
I have a collection of many different PDF files (material from a Six Sigma class). The PDF files have selectable text, meaning that they're not flat image PDFs.

I'm looking for some way to automatically create a content-searchable structure for these files.

For example, let's say I need to find everywhere that a specific word or phrase existed within the material. It would be nice to have a method to search for this word, then output a list of PDFs that contain the word (including the page number within the PDF file). As an extra, having the results automatically linked to the PDF file and page would be awesome.

Does anyone know if this is possible? And if so, how?

Thanks in advance.
0
Comment
Question by:isaacr25
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 39878344
What programming language are you using?
0
 
LVL 29

Accepted Solution

by:
Olaf Doschke earned 2000 total points
ID: 39878376
There are tools concatenating PDFs (after which you could search the single PDF), but that's perhaps not applicable to a large PDF collection.

If you're on Windows, Adobe offers an iFilter for PDF, meaning you can index the PDFs with Windows indexing. That's working better than it's reputation.

http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

Bye, Olaf.
0
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39883332
Hi isaacr25,
It's possible, but I'm not aware of an off-the-shelf app that does it, so I think it would take some custom development. I did something with a lot of the functionality you're looking for (but not all of it) as a by-product of this EE thread:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

I'm not sure if I can extend that solution into one for you...that would take some study...but before I even spend the time to study it, please contact me via the email in my EE profile if you're interested. It's beyond the scope of answering in an EE question, imo. I say this because the program mentioned above has been expanded to include subfolders and create a CSV file and is now 670 lines of code...and the Windows installer for it is an additional 115 lines of code...well beyond volunteer labor. :)  Regards, Joe
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This post contains step-by-step instructions for setting up alerting in Percona Monitoring and Management (PMM) using Grafana.
In this article, we’ll look at how to deploy ProxySQL.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question