This might require a solution that is "good enough" or a few steps in the right direction, rather than a silver bullet that will solve everything.
I am looking for a simplistic way to do some text analysis in Excel (or it could be in MS Word - or another application if that would work also).
The task is like this -- in a document with open-text survey responses (strings of text for each row)
1. After counting all the words
2. Omitting trivial words (the, a, and, it), rank the top results
3. Take the top results (e.g. top 20) and determine which non-trivial words appear in the string with it. For example: Word "service" is ranked #5 in word-count. When service is mentioned, the top words mentioned with it are: "great", "efficient", "friendly"
Here's one approach I have used.
1. convert data to single column (one word per cell)
2. Use Pivot table to count words and sort descending
3. use formulas to create phrases of two-words and three-words (concatenate words back together from original column of single words).
4. Then, find top words and see what phrases they appear in.
5. then manually pick through them and count which other words look like they're common within 3 or 4 words of the target (?????)
That is a messy solution.
To summarize -- what I'm looking for is a way to (perhaps) tag the target words (top 20 most used, for example). Then find a way to measure the distance between those target words and other words in the same strings (in a paragraph). I thoujght about converting the words to variables and then doing some math to find out which variables are closest????
There is commercial software that does this, but I'm looking for a simplistic solution in Excel -- and a chance to try some more innovative string manipulation, possibly.
Thanks for giving this a try.