Link to home
Start Free TrialLog in
Avatar of barrymulholland
barrymulholland

asked on

Counting Unique Occurances of Words in a Table

Hey Guys,

I have a table that contains vast quantities of data relating to helpdesk jobs. I'm trying to write a query that will go through a column called Symptom in a table called Faults and count the total number of times each word occurs in the table.

E.g. Say I have five column such as below

Symptoms Column                       |
----------------------------------------
back up reporting failed.
back up reporting failed
back up reporting failed.
drive space c showing 1.45GB
back up reporting failed.

I would like it to step through the table counting each unique occurance of every word it comes across and then produce a view showing this such as below:

back 4
up 4
reporting 4
failed 4
drive 1
space 1
c 1
showing 1

and so on. Anyone got any ideas??
Avatar of JestersGrind
JestersGrind
Flag of United States of America image

You could do something like this.  You need a tally table though.  A tally table is simply a table with ascending sequential numbers.  It's used to avoid looping.  In my example, I called the table Test and the column of text, Textdata.
Greg
 
 

SELECT SUBSTRING(Textdata + ' ', N, CHARINDEX(' ', Textdata + ' ', N)-N) AS Word, COUNT(*) AS Total
FROM Tally a (NOLOCK) CROSS JOIN
	Test b
WHERE N > 0 AND N<LEN(Textdata)+LEN(' ') AND SubString(' ' + Textdata + ' ', N, 1) = ' '
GROUP BY SUBSTRING(Textdata + ' ', N, CHARINDEX(' ', Textdata + ' ', N)-N)

Open in new window

Avatar of barrymulholland
barrymulholland

ASKER

Hi JestersGrind,

Thanks for the prompt response!

I created a tally table called Sequence containing 20000 sequential numbers and then attempted to use your query (copied below), but it doesn't seem to work, I get an error stating that Sequence is an invalid object. Though if I do a simple SELECT * FROM Sequence it works fine.

Any idea's?
SELECT SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N) AS Word, COUNT(*) AS Total
FROM Faults a (NOLOCK) CROSS JOIN
Sequence b
WHERE N > 0 AND N <LEN(Symptom)+LEN(' ') AND SubString(' ' + Symptom + ' ', N, 1) = ' '
GROUP BY SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N)

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of JestersGrind
JestersGrind
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Jesters,

Thanks for some reason I had to fully qualify that table (but not the other, even though they're in the same database, go figure!) I also had to cast the symptom field as it was ntext which doesn't work with + operator, changed it to nvarchar. The query is running just now, I'll let you know how I got on when it finishes :-)

Thanks,

Barry
Worked like a charm, thank you very much!