Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

How to create a glossary from many files

Posted on 2014-02-21
3
342 Views
Last Modified: 2014-02-27
I have a collection of many different PDF files (material from a Six Sigma class). The PDF files have selectable text, meaning that they're not flat image PDFs.

I'm looking for some way to automatically create a content-searchable structure for these files.

For example, let's say I need to find everywhere that a specific word or phrase existed within the material. It would be nice to have a method to search for this word, then output a list of PDFs that contain the word (including the page number within the PDF file). As an extra, having the results automatically linked to the PDF file and page would be awesome.

Does anyone know if this is possible? And if so, how?

Thanks in advance.
0
Comment
Question by:isaacr25
3 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 39878344
What programming language are you using?
0
 
LVL 29

Accepted Solution

by:
Olaf Doschke earned 500 total points
ID: 39878376
There are tools concatenating PDFs (after which you could search the single PDF), but that's perhaps not applicable to a large PDF collection.

If you're on Windows, Adobe offers an iFilter for PDF, meaning you can index the PDFs with Windows indexing. That's working better than it's reputation.

http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

Bye, Olaf.
0
 
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 39883332
Hi isaacr25,
It's possible, but I'm not aware of an off-the-shelf app that does it, so I think it would take some custom development. I did something with a lot of the functionality you're looking for (but not all of it) as a by-product of this EE thread:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

I'm not sure if I can extend that solution into one for you...that would take some study...but before I even spend the time to study it, please contact me via the email in my EE profile if you're interested. It's beyond the scope of answering in an EE question, imo. I say this because the program mentioned above has been expanded to include subfolders and create a CSV file and is now 670 lines of code...and the Windows installer for it is an additional 115 lines of code...well beyond volunteer labor. :)  Regards, Joe
0

Featured Post

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
FreeFileSync Batch Files 1 32
Veriface disable 2 31
I am having a  Git   issue 6 42
Programming Language for Wordpress 7 37
When table data gets too large to manage or queries take too long to execute the solution is often to buy bigger hardware or assign more CPUs and memory resources to the machine to solve the problem. However, the best, cheapest and most effective so…
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question