Function for word by word string comparison in T-SQL

I need to write a SQL Server function that will take in two fields and perform a logical operation in this manner
 - take first field value, split it into words
 - make sure all of these words exist in the second field.

For example: the first field of "le chien" and the second is "le chien joue avec la balle" The function would return true in this case, since all the words from the first one are in the second.

I could simply use SUBSTRING:

SUBSTRING(field1, 0, 5)=SUBSTRING(field2, 0,5)

But in some cases it would not work; if I have field1 as la maison (1) and field2 as la maison (2) SUBSTRING(field1, 0, 5)=SUBSTRING(field2, 0,5) would return true even though the strings do not match.

CHARINDEX(field1, field2) > 0 works but not for all the cases, since it only works for those strings where words are in the same order.
for example it works for "le chien" and "le chien joue avec la balle" but does not for "le chien" and "chien le joue avec la balle"

Increasing number of characters in some cases would work, but would not in others. So it all boils down to writing a function that will split the first string into separate words and ensure that all of them exist in the second string.

Can someone help me build a function or point me in the right direction? I'd really appreciate any help

P.S> I need something that I can just plug in to the select statement as a condition in a where clause
LVL 35
YZlatAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dsackerContract ERP Admin/ConsultantCommented:
I wrote a function sometime back called udf_SplitFn. You might like it. It is zero-based, so for the first and second words, you would call it as follows:

SELECT  *
FROM   (SELECT  KeyFields,
                dbo.udf_SplitFn(' ', FirstField, 0) AS FirstWord,
                dbo.udf_SplitFn(' ', FirstField, 1) AS SecondWord,
                SecondField
        FROM    YourTable) t
WHERE   CHARINDEX(FirstWord, SecondField) > 0
AND     CHARINDEX(SecondWord, SecondField) > 0

Open in new window

If you want both words to be there together, you can replace the above WHERE statement with:

WHERE   CHARINDEX(FirstWord + ' ' + SecondWord, SecondField) > 0
dsackerContract ERP Admin/ConsultantCommented:
The code for the function is here:

CREATE FUNCTION udf_SplitFn (
    @sep    varchar(80)   = ',',
    @str    varchar(4000) = null,
    @ndx    smallint      = 0)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE	@piece	varchar(4000);
    SET	@ndx = @ndx + 1;

    WITH Pieces(pn, start, stop) AS
    (
        SELECT 1, 1, CHARINDEX(@sep, @str)
        UNION ALL
        SELECT pn + 1, stop + LEN(@sep), CHARINDEX(@sep, @str, stop + 1)
        FROM Pieces
        WHERE stop > 0
    )
        SELECT	@piece = SUBSTRING(@str, start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END)
        FROM	Pieces
        WHERE	pn = @ndx;

    RETURN @piece;
END

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ste5anSenior DeveloperCommented:
The cleanest solution is imho a CLR function. Otherwise use a T-SQL function using a split function:

 
USE [tempdb];
GO

DECLARE @Sample TABLE
    (
      String1 NVARCHAR(255) ,
      String2 NVARCHAR(255)
    );

INSERT  INTO @Sample
        ( String1, String2 )
VALUES  ( N'3 1 2 3', N'1 2' ),
        ( N'3 2 1', N'1 2 3' ),
        ( N'3 1 2', N'1 2 4' ),
        ( N'le chien', N'le chien joue avec la balle' ),
        ( N'le chien', N'chien le joue avec la balle' );		

SELECT  * ,
        dbo.fn_CompareToken(S.String1, S.String2)
FROM    @Sample S;

SELECT  *
FROM    @Sample S
WHERE   dbo.fn_CompareToken(S.String1, S.String2) = 1;

Open in new window


with

USE tempdb;
GO

SET ANSI_NULLS ON;
GO
SET QUOTED_IDENTIFIER ON;
GO

CREATE FUNCTION dbo.fn_Split ( @String NVARCHAR(MAX) )
RETURNS TABLE
AS
RETURN
    (	
        WITH    Positions ( stpos, endpos )
                  AS ( SELECT   CAST(0 AS BIGINT) AS stpos ,
                                CHARINDEX(' ', @String) AS endpos
                       UNION ALL
                       SELECT   endpos + 1 ,
                                CHARINDEX(' ', @String, endpos + 1)
                       FROM     Positions
                       WHERE    endpos > 0
                     )
    SELECT  SUBSTRING(@String, stpos, COALESCE(NULLIF(endpos, 0), LEN(@String) + 1) - stpos) AS Token
    FROM    Positions
	);

Open in new window


and

USE tempdb;

GO

CREATE FUNCTION dbo.fn_CompareToken
    (
      @String1 NVARCHAR(255) ,
      @String2 NVARCHAR(255)
    )
RETURNS BIT
AS
    BEGIN
        DECLARE @Result BIT = 0;
        DECLARE @Tokens1 TABLE ( Token NVARCHAR(MAX) );
        DECLARE @Tokens2 TABLE ( Token NVARCHAR(MAX) );

        INSERT  INTO @Tokens1
                SELECT  Token
                FROM    dbo.fn_Split(@String1);

        INSERT  INTO @Tokens2
                SELECT  Token
                FROM    dbo.fn_Split(@String2);

        IF ( NOT EXISTS ( SELECT    Token
                          FROM      @Tokens1
                          EXCEPT
                          SELECT    Token
                          FROM      @Tokens2 )
             OR NOT EXISTS ( SELECT Token
                             FROM   @Tokens2
                             EXCEPT
                             SELECT Token
                             FROM   @Tokens1 )
           )
            BEGIN
                SET @Result = 1;
            END;
        
        RETURN @Result;
    END;

Open in new window

Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Scott PletcherSenior DBACommented:
Much easier if you have a key value with original strings, some type of identifier.  Then you just need the "standard" splitter, a CROSS APPLY and a GROUP BY.  

I'll assume for now that space is enough of a word delimiter.  That is, that you do not have to worry about:
"le chien, ..."

But, if necessary, you could run each Item returned from the second string through a function that would remove commas and periods and/or other chars, such as colons, dashes, etc.

SELECT id, MAX(string1) AS string1, MAX(string2) AS string2
FROM (
    SELECT 1 AS id, 'le chien' AS string1, 'le chien joue avec la balle' AS string2 UNION ALL
    SELECT 2, 'le chien', 'chien le joue avec la balle' UNION ALL
    SELECT 3, 'le chat', 'le chien joue avec la balle blanche' UNION ALL
    SELECT 4, 'das hund', 'le chien joue avec la balle blanche'
) AS test_data
CROSS APPLY (
    SELECT s1.Item, s2.Item AS Item2
    FROM dbo.DelimitedSplit8K ( string1, ' ' ) s1
    LEFT OUTER JOIN dbo.DelimitedSplit8K ( string2, ' ' ) AS s2 ON
        s2.Item = s1.Item
) AS ca
GROUP BY id
HAVING SUM(CASE WHEN Item2 IS NULL THEN 1 ELSE 0 END) = 0
Vitor MontalvãoMSSQL Senior EngineerCommented:
YZlat, do you still need help with this question
YZlatAuthor Commented:
sorry, I was pulled off to something else. Will test the solutions offered today
YZlatAuthor Commented:
Didn't get a chance to test it out but won't in a while, so I am just going to accept those answers that could work out - functions. Other answers won't work because I need to use that function in a query join
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.