Counting Unique Occurances of Words in a Table

Hey Guys,

I have a table that contains vast quantities of data relating to helpdesk jobs. I'm trying to write a query that will go through a column called Symptom in a table called Faults and count the total number of times each word occurs in the table.

E.g. Say I have five column such as below

Symptoms Column                       |
back up reporting failed.
back up reporting failed
back up reporting failed.
drive space c showing 1.45GB
back up reporting failed.

I would like it to step through the table counting each unique occurance of every word it comes across and then produce a view showing this such as below:

back 4
up 4
reporting 4
failed 4
drive 1
space 1
c 1
showing 1

and so on. Anyone got any ideas??
Who is Participating?
JestersGrindConnect With a Mentor Commented:
I just created tables with the same names as your example above and it worked fine for me.  Make sure that the Sequence table is in the same database as Faults or fully qualify it.  And the ascending value column in the Sequence table is called N or replace N in the query above with the correct name of the column.

Lastly to get the exact same output as your example, you have to add ORDER BY Total DESC, Word.


You could do something like this.  You need a tally table though.  A tally table is simply a table with ascending sequential numbers.  It's used to avoid looping.  In my example, I called the table Test and the column of text, Textdata.

SELECT SUBSTRING(Textdata + ' ', N, CHARINDEX(' ', Textdata + ' ', N)-N) AS Word, COUNT(*) AS Total
	Test b
WHERE N > 0 AND N<LEN(Textdata)+LEN(' ') AND SubString(' ' + Textdata + ' ', N, 1) = ' '
GROUP BY SUBSTRING(Textdata + ' ', N, CHARINDEX(' ', Textdata + ' ', N)-N)

Open in new window

barrymulhollandAuthor Commented:
Hi JestersGrind,

Thanks for the prompt response!

I created a tally table called Sequence containing 20000 sequential numbers and then attempted to use your query (copied below), but it doesn't seem to work, I get an error stating that Sequence is an invalid object. Though if I do a simple SELECT * FROM Sequence it works fine.

Any idea's?
SELECT SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N) AS Word, COUNT(*) AS Total
Sequence b
WHERE N > 0 AND N <LEN(Symptom)+LEN(' ') AND SubString(' ' + Symptom + ' ', N, 1) = ' '
GROUP BY SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N)

Open in new window

barrymulhollandAuthor Commented:
Hi Jesters,

Thanks for some reason I had to fully qualify that table (but not the other, even though they're in the same database, go figure!) I also had to cast the symptom field as it was ntext which doesn't work with + operator, changed it to nvarchar. The query is running just now, I'll let you know how I got on when it finishes :-)


barrymulhollandAuthor Commented:
Worked like a charm, thank you very much!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.