?
Solved

Counting Unique Occurances of Words in a Table

Posted on 2009-04-07
5
Medium Priority
?
628 Views
Last Modified: 2012-05-06
Hey Guys,

I have a table that contains vast quantities of data relating to helpdesk jobs. I'm trying to write a query that will go through a column called Symptom in a table called Faults and count the total number of times each word occurs in the table.

E.g. Say I have five column such as below

Symptoms Column                       |
----------------------------------------
back up reporting failed.
back up reporting failed
back up reporting failed.
drive space c showing 1.45GB
back up reporting failed.

I would like it to step through the table counting each unique occurance of every word it comes across and then produce a view showing this such as below:

back 4
up 4
reporting 4
failed 4
drive 1
space 1
c 1
showing 1

and so on. Anyone got any ideas??
0
Comment
Question by:barrymulholland
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 21

Expert Comment

by:JestersGrind
ID: 24089568
You could do something like this.  You need a tally table though.  A tally table is simply a table with ascending sequential numbers.  It's used to avoid looping.  In my example, I called the table Test and the column of text, Textdata.
Greg
 
 

SELECT SUBSTRING(Textdata + ' ', N, CHARINDEX(' ', Textdata + ' ', N)-N) AS Word, COUNT(*) AS Total
FROM Tally a (NOLOCK) CROSS JOIN
	Test b
WHERE N > 0 AND N<LEN(Textdata)+LEN(' ') AND SubString(' ' + Textdata + ' ', N, 1) = ' '
GROUP BY SUBSTRING(Textdata + ' ', N, CHARINDEX(' ', Textdata + ' ', N)-N)

Open in new window

0
 
LVL 4

Author Comment

by:barrymulholland
ID: 24095312
Hi JestersGrind,

Thanks for the prompt response!

I created a tally table called Sequence containing 20000 sequential numbers and then attempted to use your query (copied below), but it doesn't seem to work, I get an error stating that Sequence is an invalid object. Though if I do a simple SELECT * FROM Sequence it works fine.

Any idea's?
SELECT SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N) AS Word, COUNT(*) AS Total
FROM Faults a (NOLOCK) CROSS JOIN
Sequence b
WHERE N > 0 AND N <LEN(Symptom)+LEN(' ') AND SubString(' ' + Symptom + ' ', N, 1) = ' '
GROUP BY SUBSTRING(Symptom + ' ', N, CHARINDEX(' ', Symptom + ' ', N)-N)

Open in new window

0
 
LVL 21

Accepted Solution

by:
JestersGrind earned 2000 total points
ID: 24096218
I just created tables with the same names as your example above and it worked fine for me.  Make sure that the Sequence table is in the same database as Faults or fully qualify it.  And the ascending value column in the Sequence table is called N or replace N in the query above with the correct name of the column.

Lastly to get the exact same output as your example, you have to add ORDER BY Total DESC, Word.

Greg


0
 
LVL 4

Author Comment

by:barrymulholland
ID: 24096863
Hi Jesters,

Thanks for some reason I had to fully qualify that table (but not the other, even though they're in the same database, go figure!) I also had to cast the symptom field as it was ntext which doesn't work with + operator, changed it to nvarchar. The query is running just now, I'll let you know how I got on when it finishes :-)

Thanks,

Barry
0
 
LVL 4

Author Closing Comment

by:barrymulholland
ID: 31567578
Worked like a charm, thank you very much!
0

Featured Post

Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A company’s centralized system that manages user data, security, and distributed resources is often a focus of criminal attention. Active Directory (AD) is no exception. In truth, it’s even more likely to be targeted due to the number of companies …
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question