SQL Query question - get most common words/string

I am looking to get, say the 30 most commonly used strings/phrases on a column called "item" in my database.  How can I write a query that would not only return the top 30 words/string, but also tell me how many of them there are of each?  I'm not a SQL genius, so any help is appreciated.  Here's some sample data:

ITEM
Mustang
Mustang GT
Fiero
Camaro IROC
Firebird
Firebird
Sport TS
Sunbird GTS
Maxima
Altima
Altima GX
saturationAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

WayneATaylorCommented:
As long as you want to just look at the whole field contents as a whole then the following will work

select top 30 item,count(*) from TABLE group by item order by count(*) desc

Just replace the table name in TABLE

Wayne
0
saturationAuthor Commented:
What if I'm looking for any string more than 2 characters in the column rather than the whole column?
0
WayneATaylorCommented:
Much more complicated!

One way that I have done type of thing in the passed is have some Transact SQL code that goes through the whole table row by row, then goes through each fields contents to pick the words out (over two characters anyway) and then creates a temp table that has two fields a WORD field (Although can't be called word because of naming restrictions!) and a count field (Same naming restriction!) and then either creates an entry if there isn't one with a count of 1, or if there is already an entry it adds one to the count.
At the end of the code you can then just do the same query but without the GROUP so just

select top 30 word,count rom TABLE order by count desc

Does this make sense.  I might have some sample code for the transact SQL if required.

Wayne
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

saturationAuthor Commented:
I would love some sample code--I need all the help I can get!  Thanks in advance!
0
WayneATaylorCommented:
As an overview the following code will give you an idea of how to use a cursor to step through the table.

The lines with // are comments...

//Firstly declare a cursor using
declare cursorname CURSOR for select item from TABLE

//Declare a field to use for the field contents.  Set the number to be the max size of the field
declare @fieldcontents as varchar(100)

//Then open the cursor
OPEN cursorname

//Then fetch the first entry
FETCH NEXT FROM cursorname INTO @fieldcontents

WHILE @@fetch_status=0
BEGIN

//You will at this point have the while contents of the field in the @fieldcontents variable
//You will need them to have the code to go through the contents and write to the temp table

      FETCH NEXT FROM cursorname INTO @fieldcontents
END
CLOSE Updategm
DEALLOCATE Updategm


I guess this is about half of what you need, but should give you a good start!

Wayne
0
WayneATaylorCommented:
In fact I forgot to change the cursor name in the last few lines, you need what is your cursor name

Wayne
0
Scott PletcherSenior DBACommented:
A highly efficient split function is below.  Create the function once first for use by the code below.

Then, assuming you want to treat a space as the only "word" delimiter, you can do this:

SELECT TOP (30)
    ss.value AS string, COUNT(*) AS string_count
FROM dbo.your_table_name ytn
CROSS APPLY dbo.SplitStringIntoTable ( ytn.ITEM, SPACE(1) ) ss
WHERE
    LEN(ss.value) > 1
GROUP BY
    ss.value
ORDER BY
    string_count DESC

And/Or you could look into using a full-text index on that column :-) .



SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[SplitStringIntoTable]  (
    @string varchar(8000),
    @delimiter varchar(5)
    )
RETURNS TABLE WITH SCHEMABINDING
AS
--SELECT * FROM (SELECT 'ab/c/def/ghijklm/no/prq/////st/u//' AS string) AS test_values CROSS APPLY DBA.dbo.SplitStringIntoTable(test_values.string, '/')
RETURN
--Inline-CTE-Driven "tally table" produces values from 0 to 9999  ... enough to cover varchar(8000)
WITH cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS num
    FROM cteDigits [1s] --0 thru 9
    CROSS JOIN cteDigits [10s] --thru 99
    CROSS JOIN cteDigits [100s] --thru 999
    CROSS JOIN cteDigits [1000s] --thru 9999
),
cteBase(num) AS ( --Limit the number of rows up front, for both a performance gain and prevention of accidental "overruns"
        SELECT 0 UNION ALL
        SELECT TOP (DATALENGTH(ISNULL(@string, 1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM cteTally
),
cteStart(num1) AS ( --This returns num + 1 (starting position of each "value" just once for each delimiter)
        SELECT b.num + 1
        FROM cteBase b
        WHERE (SUBSTRING(@string, b.num, 1) = @delimiter OR b.num = 0)
)
--Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ROW_NUMBER() OVER(ORDER BY s.num1) AS value_seq,
       LTRIM(RTRIM(SUBSTRING(@string, s.num1, ISNULL(NULLIF(CHARINDEX(@delimiter, @string, s.num1), 0) - s.num1, 8000)))) AS value
FROM cteStart s;

GO
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
saturationAuthor Commented:
ScottPletcher,

I understand all of this except for the "ss.value"--what is that supposed to represent (by the way, thanks!)?
0
Scott PletcherSenior DBACommented:
That's the column name for the values returned by the function.

For example, if the original table had string:

'ab,cde,fghij,k'

then the function would return a table that looks like this:

value_seq, value:
1, ab
2, cde
3, fghij
4, k

Just run the function by itself against the table, with a SELECT * instead of a GROUP BY, and you'll see the new column "value" coming from the function, with a single value from the string.
0
saturationAuthor Commented:
Oh--this is perfect.  Exactly what I needed--thanks!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.