?
Solved

Count distinct words in a text field, group and order by the result

Posted on 2006-05-22
7
Medium Priority
?
390 Views
Last Modified: 2012-06-21
Hi,
I have some articles that I want to examine with regard to frequency of words.  They are held in MS Sql database, and I would be using Query Analyser to get the results, or possibly a stored procedure.

I only need to do this one at a time but I have:
id, article_title, article text  (as int,text,text respectively).  The article_text field can have a long article in of several thousand words.  What I want is to be able to count the number of times each word appears in the article and then rank by the result.

Ideally for bells and whistles I'd like to alos be able to omit a list of common words (eg the, and etc), omit punctuation and have the option to count words of more than 4 characters.  But the answer for the first part only would be excellent.

A word is defined by being separated by a space.

Thanks in advance,

andrew
0
Comment
Question by:andrewh123
7 Comments
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 1200 total points
ID: 16733946
this looks like you wanted to use full-text indexing, which supports noise words.
However, it will not count the words for you, only tell you if a given word is found in a row...

to count the frequency of the words, you should develop a small application that scans row by row, and builds up the list of words and their counts... I don't see a SQL way of doing this efficiently.
0
 
LVL 28

Assisted Solution

by:strickdd
strickdd earned 400 total points
ID: 16733978
This link might be good reference point to start: http://weblogs.asp.net/pleloup/archive/2003/04/14/5569.aspx. I don't know of any good programatic way to do this in SQL, but you could write a piece of VBscript since this is just a one time thing.
0
 
LVL 27

Assisted Solution

by:ptjcb
ptjcb earned 400 total points
ID: 16734094
There is no simple way to do this with t-sql. The language is not created for working with strings like that. You could create something in visual basic/C# probabaly that will do that - export the data to a text file, run this external program and there you have it.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 28

Expert Comment

by:strickdd
ID: 16734132
ptjcb, You don't even need to export the data to a text file. Scripting languages have the ability to connect to a DB and read in a dataset.
0
 
LVL 27

Expert Comment

by:ptjcb
ID: 16734337
True, I just like less load on the server and something that runs through every word could be fairly intensive (depending on how well the app is written).
0
 
LVL 28

Expert Comment

by:strickdd
ID: 16734376
I guess it is a matter of preference, but I think it would be better just making a script call because you only need one transfer of data instead of two. (SQL to VBScript  as opposed to SQL to TextFile To VBScript).
0
 
LVL 1

Author Comment

by:andrewh123
ID: 16734515
Thanks guys - this has saved me the time of struggling with something that was never going to work and scripting an export to a text file would seem like a good way of doing it.  I'll try and split the points up as you've each given me something helpful, but as angelIII was first, I'll give the lions share there.

0

Featured Post

Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
Via a live example, show how to setup several different housekeeping processes for a SQL Server.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
Suggested Courses

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question