asked on

Comparing two column strings for differences and updating table

given two tables with columns like so:
TableA
id Col1
1 blue,green,pink
2 fat,ugly,stupid,smart
3 larry,james,philip,john

TableB
id Col2
3 larry,james,tih,john,ann,,
2 fat,krank,ugly,stupid,smart
1 yellow

I want to be able to find and add (append) to TableB (col2) all the missing words found in TableA, Col1 using the id as the relationship that link both tables. The end result should be Col1 a subset of Col2. words are comma seperated.

I found a function which was very handy in the comparism of strings but i could not figure how to implement it to update the table with missing values;

using the DelimitedSplit8K function, found here
http://www.sqlservercentral.com/Forums/Topic1518039-8-1.aspx

DECLARE @String1 varchar(30) = '356,840',
@String2 varchar(30) = '840,356'

IF NOT EXISTS(
SELECT fn.Item from dbo.DelimitedSplit8K(@String1,',') fn
EXCEPT
SELECT fn.Item from dbo.DelimitedSplit8K(@String2,',') fn
)
PRINT 'No Differences found'
ELSE
PRINT 'they are not the same, regardless of order'

--*******************************************************************
I ll like to put an update statement in the place of
PRINT 'they are not the same, regardless of order'

update TableB
set Col2 = Col2 + Col1
where 'the camparism above shows a mismatch'

so the result of the select statement will be like this;
select col2 from TableB where id = 1

results;
blue,green,pink,yellow

ASKER CERTIFIED SOLUTION

Brian Crowe

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

OriNetworks

You should be able to do something similar to

UPDATE TableA
FROM 
TableA a inner join TableB b on a.ID=b.ID set a.Col1 = (
select item + ',' from
    (select distinct t1.item from
      (Select item from dbo.DelimitedSplit8K(a.Col1) UNION ALL Select item from dbo.DelimitedSplit8K(a.Col2) ) t1
    ) t2 FOR XML PATH ('')
)

Open in new window

This is saying update the field by joining the comma separated records from each field and get distinct values.

peterslove53

ASKER

I get syntax error:
Incorrect syntax near '.'
at this row:
CROSS APPLY Kepler.dbo.SplitText(A.Col1, ',') AS SplitA

on further investigation,, i read online that, this error may likely be as a result of;

database not running in Compatibility mode 90 or above

peterslove53

ASKER

how can i just select all the IDs of the columns having a mismatch?

Brian Crowe

You would need to replace the SplitText function with one that you have available.

I'm not sure I understand what you're asking for in the latest comment. Can you give an example?

peterslove53

ASKER

i ll like to output the column Ids of table B which has a mismatch with tableA (i.e records of tableA where col1 in TableA is not a subset of col2 in TableB. result wil be those with id (1 and 3) since they are not subsets of TableB
1
3

Brian Crowe

DECLARE @TableA TABLE
(
	ID		INT IDENTITY(1,1) NOT NULL,
	Col1	VARCHAR(MAX) NOT NULL
);

DECLARE @TableB TABLE
(
	ID		INT IDENTITY(1,1) NOT NULL,
	Col2	VARCHAR(MAX) NOT NULL
);

INSERT INTO @TableA (Col1)
VALUES ('blue,green,pink'),
	('fat,ugly,stupid,smart'),
	('larry,james,philip,john');

INSERT INTO @TableB (Col2)
VALUES ('yellow'),
	('fat,krank,ugly,stupid,smart'),
	('larry,james,tih,john,ann');

--SELECT * FROM @TableA;
--SELECT * FROM @TableB;

WITH cteTableA AS
(
	SELECT A.ID, SplitA.Value
	FROM @TableA AS A
	CROSS APPLY Kepler.dbo.SplitText(A.Col1, ',') AS SplitA
),
cteTableB AS
(
	SELECT B.ID, SplitB.Value
	FROM @TableB AS B
	CROSS APPLY Kepler.dbo.SplitText(B.Col2, ',') AS SplitB
)
SELECT ID
FROM
(
	SELECT ISNULL(A.ID, B.ID) AS ID,
		SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) AS AOnlyCount,
		SUM(CASE WHEN A.Value IS NULL AND B.Value IS NOT NULL THEN 1 ELSE 0 END) AS BOnlyCount,
		SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NOT NULL THEN 1 ELSE 0 END) AS Matches
	FROM cteTableA AS A
	FULL OUTER JOIN cteTableB AS B
		ON A.ID = B.ID
		AND A.Value = B.Value
	GROUP BY ISNULL(A.ID, B.ID)
) AS Summary
WHERE AOnlyCount > 0	-- Not a subset

Open in new window

Scott Pletcher

On the last query, you don't need a FULL JOIN nor a separate outer query, just add HAVING to the main query:

cteTableB AS
)
...
)
SELECT A.ID,
            SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) AS AOnlyCount,
            SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NOT NULL THEN 1 ELSE 0 END) AS Matches
FROM cteTableA AS A
LEFT OUTER JOIN cteTableB AS B
      ON A.ID = B.ID
      AND A.Value = B.Value
GROUP BY A.ID
HAVING SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) > 0