Count distinct words in a text field, group and order by the result
Posted on 2006-05-22
I have some articles that I want to examine with regard to frequency of words. They are held in MS Sql database, and I would be using Query Analyser to get the results, or possibly a stored procedure.
I only need to do this one at a time but I have:
id, article_title, article text (as int,text,text respectively). The article_text field can have a long article in of several thousand words. What I want is to be able to count the number of times each word appears in the article and then rank by the result.
Ideally for bells and whistles I'd like to alos be able to omit a list of common words (eg the, and etc), omit punctuation and have the option to count words of more than 4 characters. But the answer for the first part only would be excellent.
A word is defined by being separated by a space.
Thanks in advance,