barrymulholland
asked on
Counting Unique Occurances of Words in a Table
Hey Guys,
I have a table that contains vast quantities of data relating to helpdesk jobs. I'm trying to write a query that will go through a column called Symptom in a table called Faults and count the total number of times each word occurs in the table.
E.g. Say I have five column such as below
Symptoms Column |
-------------------------- ---------- ----
back up reporting failed.
back up reporting failed
back up reporting failed.
drive space c showing 1.45GB
back up reporting failed.
I would like it to step through the table counting each unique occurance of every word it comes across and then produce a view showing this such as below:
back 4
up 4
reporting 4
failed 4
drive 1
space 1
c 1
showing 1
and so on. Anyone got any ideas??
I have a table that contains vast quantities of data relating to helpdesk jobs. I'm trying to write a query that will go through a column called Symptom in a table called Faults and count the total number of times each word occurs in the table.
E.g. Say I have five column such as below
Symptoms Column |
--------------------------
back up reporting failed.
back up reporting failed
back up reporting failed.
drive space c showing 1.45GB
back up reporting failed.
I would like it to step through the table counting each unique occurance of every word it comes across and then produce a view showing this such as below:
back 4
up 4
reporting 4
failed 4
drive 1
space 1
c 1
showing 1
and so on. Anyone got any ideas??
ASKER
Hi JestersGrind,
Thanks for the prompt response!
I created a tally table called Sequence containing 20000 sequential numbers and then attempted to use your query (copied below), but it doesn't seem to work, I get an error stating that Sequence is an invalid object. Though if I do a simple SELECT * FROM Sequence it works fine.
Any idea's?
Thanks for the prompt response!
I created a tally table called Sequence containing 20000 sequential numbers and then attempted to use your query (copied below), but it doesn't seem to work, I get an error stating that Sequence is an invalid object. Though if I do a simple SELECT * FROM Sequence it works fine.
Any idea's?
SELECT SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N) AS Word, COUNT(*) AS Total
FROM Faults a (NOLOCK) CROSS JOIN
Sequence b
WHERE N > 0 AND N <LEN(Symptom)+LEN(' ') AND SubString(' ' + Symptom + ' ', N, 1) = ' '
GROUP BY SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi Jesters,
Thanks for some reason I had to fully qualify that table (but not the other, even though they're in the same database, go figure!) I also had to cast the symptom field as it was ntext which doesn't work with + operator, changed it to nvarchar. The query is running just now, I'll let you know how I got on when it finishes :-)
Thanks,
Barry
Thanks for some reason I had to fully qualify that table (but not the other, even though they're in the same database, go figure!) I also had to cast the symptom field as it was ntext which doesn't work with + operator, changed it to nvarchar. The query is running just now, I'll let you know how I got on when it finishes :-)
Thanks,
Barry
ASKER
Worked like a charm, thank you very much!
Greg
Open in new window