Solved

How to create a glossary from many files

Posted on 2014-02-21
3
345 Views
Last Modified: 2014-02-27
I have a collection of many different PDF files (material from a Six Sigma class). The PDF files have selectable text, meaning that they're not flat image PDFs.

I'm looking for some way to automatically create a content-searchable structure for these files.

For example, let's say I need to find everywhere that a specific word or phrase existed within the material. It would be nice to have a method to search for this word, then output a list of PDFs that contain the word (including the page number within the PDF file). As an extra, having the results automatically linked to the PDF file and page would be awesome.

Does anyone know if this is possible? And if so, how?

Thanks in advance.
0
Comment
Question by:isaacr25
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 39878344
What programming language are you using?
0
 
LVL 29

Accepted Solution

by:
Olaf Doschke earned 500 total points
ID: 39878376
There are tools concatenating PDFs (after which you could search the single PDF), but that's perhaps not applicable to a large PDF collection.

If you're on Windows, Adobe offers an iFilter for PDF, meaning you can index the PDFs with Windows indexing. That's working better than it's reputation.

http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

Bye, Olaf.
0
 
LVL 54

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39883332
Hi isaacr25,
It's possible, but I'm not aware of an off-the-shelf app that does it, so I think it would take some custom development. I did something with a lot of the functionality you're looking for (but not all of it) as a by-product of this EE thread:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

I'm not sure if I can extend that solution into one for you...that would take some study...but before I even spend the time to study it, please contact me via the email in my EE profile if you're interested. It's beyond the scope of answering in an EE question, imo. I say this because the program mentioned above has been expanded to include subfolders and create a CSV file and is now 670 lines of code...and the Windows installer for it is an additional 115 lines of code...well beyond volunteer labor. :)  Regards, Joe
0

Featured Post

The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows the steps required to install WordPress on Azure. Web Apps, Mobile Apps, API Apps, or Functions, in Azure all these run in an App Service plan. WordPress is no exception and requires an App Service Plan and Database to install
Invest in your employees with these five simple steps to improve employee engagement and retention.
The viewer will learn how to successfully create a multiboot device using the SARDU utility on Windows 7. Start the SARDU utility: Change the image directory to wherever you store your ISOs, this will prevent you from having 2 copies of an ISO wit…
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question