Link to home
Start Free TrialLog in
Avatar of Chris Michalczuk
Chris MichalczukFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Stripping individual words from a string into a file with counts

I have a column in a database called Description and I want to strip out all the individual words that exist in this column into a separate table and count the occurances.
As part of this process the column with be turned to UPPER(Decsription)
so for example

Description
magento integration
integration magento
amazon and ebay exist
magento/integration

returns temp table with 2 fields

Words                    count
magento                   2
integration               2
amazon                     1
and                             1  
ebay                            1
magento/integration   1
exists                            1

how and is this possible
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Hmm, if we are to use UPPER() on the description, why would you expect the result to come back as lower case?!
hmm, I would guess that you should go for a CLR solution. Otherwise you need a split function:

 
DECLARE @Sample TABLE
    (
      [Description] NVARCHAR(MAX)
    );

INSERT  INTO @Sample
VALUES  ( 'magento integration' ),
        ( 'integration magento' ),
        ( 'amazon and ebay exist' ),
        ( 'magento/integration' );

SELECT  R.Value ,
        COUNT(*)
FROM    @Sample S
        CROSS APPLY dbo.fn_inline_split(S.[Description], ' ') R
GROUP BY R.Value;

Open in new window


with the split function

 
CREATE FUNCTION dbo.fn_inline_split
    (
      @param NVARCHAR(MAX) ,
      @splitchar NCHAR(1) = ','
    )
RETURNS TABLE
AS
RETURN
    ( SELECT    LTRIM(RTRIM(CONVERT(NVARCHAR(4000), SUBSTRING(@param, Number,
                                                              CHARINDEX(@splitchar COLLATE Slovenian_BIN2, @param + CONVERT(NVARCHAR(MAX), @splitchar), Number)
                                                              - Number)))) AS Value
      FROM      dbo.Numbers
      WHERE     Number <= CONVERT(INT, LEN(@param))
                AND SUBSTRING(CONVERT(NVARCHAR(MAX), @splitchar) + @param, Number, 1) = @splitchar COLLATE Slovenian_BIN2
    );

Open in new window


and the number table

CREATE TABLE dbo.Numbers
    (
      Number INT NOT NULL ,
      PRIMARY KEY CLUSTERED ( number ASC )
    );
GO

DECLARE @i INT = 1;

WHILE ( @i < 1000000 )
    BEGIN
        INSERT  INTO dbo.Numbers
        VALUES  ( @i );
        SET @i += 1;
    END;
GO

Open in new window


Caveat: The numbers table must contain as much numbers as the maximum of chars which can occur in on of your values.
Avatar of Chris Michalczuk

ASKER

Scott the Upper was a red herring I assumed I'd turn everything into the same case to make the counts easier.
My table is call AllOpportunities and the data is held in AllOpportunities.description

How do I use the above to achieve a seperate table of words and counts
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks