MS SQL Query

I am trying to write a sql query to retrieve all unique words from multiple rows from one column:


Row 1: This is title one.
Row 2: This is title two.



Thanks in advance.
Who is Participating?
Mark WillsTopic AdvisorCommented:
OK, assuming it is MS SQL of at least SQL 2005 or above (please advise)

then try the code below...

it is a single query, but starts by building some test data first - you wont have to do that when you go to test against your actual data source (after playing with the example below).

-- first up lets build a test table and some dummy data so we can play with our query

if object_id('tempdb..#my_table','U') is not null drop table #my_table

create table #my_table (id int identity, comments varchar(200))

insert #my_table (comments)
select 'This is title one.' as comments
union all
select 'This is title two.'
union all
select 'This ! is yet another title two.'
union all
select 'And what about puntuation?'
union all
select 'Or maybe "quotes"'

-- now we have some test data to play with, lets see what kind of query we can come up with...

;with words_cte as
select rtrim(ltrim(substring(comments,n,charindex(' ',comments+' ',n+1)-n))) as The_Word
from ( select replace(replace(replace(replace(comments,'"',' '),'!',' '),'?',' '),'.',' ') as comments from #my_table ) srce
cross join (select number n from master..spt_values where type = 'p')  numbers
where substring(' '+comments,n,1) = ' '
and n <= len(comments+' ') 
select the_word --, count(*) as occurances 
from words_cte
where len(the_word) > 0  -- eliminate noise words as much as possible so make it 1,2 or 3 instead of zero.
group by the_word

-- to test the above query against your own test data, 
-- first replace the column "comments" with the real column name, 
-- and the table "#my_table" with your table name

Open in new window

Now, those punctuation marks might get annoying, so, we might need to do something about those... The example above simply checks for a few of the more obvious, but you could have a lot more (like comma, semicolon or basically any special character).

The master..spt_values is a special table (comes standard and almost a system table)  and has numbers up to 2048. For longer string you might need to build a numbers table.

You can read about spt-values in my Article :

I don't think there is a way to do that you want in one query.
It can be done by a stored procedure.
First you create a new table (in memory) of one column with unique status.
Then you parse each row of your source table and insert  into the new table each word.
Anyway, try to check this one.
Mark WillsTopic AdvisorCommented:
Yep, can be done (I think, or, would at least like to try if MS SQL )...

But is this MySQL or MS SQL ? Your zone says MySQL but your title says MS SQL
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Guy Hengel [angelIII / a3]Billing EngineerCommented:
I presume that you are using MS SQL Server (due to the tags), and not MySQL, please confirm (so we can also move the Q to the correct zone)

if that is correct, please create this function dbo.ParmsToList() in your database:

then, your query goes like this:
select distinct value
 from (select f.value from yourtable t cross apply dbo.ParmsToList(t.your_column_name, ' ') f) sq

Open in new window

MaxKroyAuthor Commented:
MSSQL. For some reason the tags are getting messed up when I submit the question.
MaxKroyAuthor Commented:
You are truly an expert. Thanks for the query. Cheers.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.