Solved

I need to identify and count unique words in a group of documents

Posted on 2010-08-25
2
399 Views
Last Modified: 2012-05-10
I have a group of documents that I am trying to assess the most frequently used words in.
I am looking for a tool (similar to a wordpress tag cloud) that will run through a series of docs (Office and PDF) to create a list of the words used and their frequency across all docs.
I am happy to do it one at a time as the group is not overly large, but do need to be able to do both word and pdf files.
0
Comment
Question by:Barry Gill
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 4

Accepted Solution

by:
tzwimfam earned 500 total points
ID: 33520187
it is kind of manual but this works.... http://tagcrowd.com/
0
 
LVL 9

Author Closing Comment

by:Barry Gill
ID: 33530298
Thanks for this, hopefully I can automate it a bit :)
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

*Adobe Acrobat 9 was used for this article.  Particular steps may vary depending on software versions. Adobe Acrobat has many, many variables that my be utilized to customize your forms for clarity and ease of use. The Form Editing Tool will be y…
The Adobe PDF proprietary file format is recognized as secure and formulated. But these PDF files are also prone to corruption and any external threat like virus attacks, improper storage can hit PDF file integrity.This type of damages can make cruc…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question