peterslove53
asked on
Comparing two column strings for differences and updating table
given two tables with columns like so:
TableA
id Col1
1 blue,green,pink
2 fat,ugly,stupid,smart
3 larry,james,philip,john
TableB
id Col2
3 larry,james,tih,john,ann,,
2 fat,krank,ugly,stupid,smar t
1 yellow
I want to be able to find and add (append) to TableB (col2) all the missing words found in TableA, Col1 using the id as the relationship that link both tables. The end result should be Col1 a subset of Col2. words are comma seperated.
I found a function which was very handy in the comparism of strings but i could not figure how to implement it to update the table with missing values;
using the DelimitedSplit8K function, found here
http://www.sqlservercentral.com/Forums/Topic1518039-8-1.aspx
DECLARE @String1 varchar(30) = '356,840',
@String2 varchar(30) = '840,356'
IF NOT EXISTS(
SELECT fn.Item from dbo.DelimitedSplit8K(@Stri ng1,',') fn
EXCEPT
SELECT fn.Item from dbo.DelimitedSplit8K(@Stri ng2,',') fn
)
PRINT 'No Differences found'
ELSE
PRINT 'they are not the same, regardless of order'
--************************ ********** ********** ********** ********** ***
I ll like to put an update statement in the place of
PRINT 'they are not the same, regardless of order'
update TableB
set Col2 = Col2 + Col1
where 'the camparism above shows a mismatch'
so the result of the select statement will be like this;
select col2 from TableB where id = 1
results;
blue,green,pink,yellow
TableA
id Col1
1 blue,green,pink
2 fat,ugly,stupid,smart
3 larry,james,philip,john
TableB
id Col2
3 larry,james,tih,john,ann,,
2 fat,krank,ugly,stupid,smar
1 yellow
I want to be able to find and add (append) to TableB (col2) all the missing words found in TableA, Col1 using the id as the relationship that link both tables. The end result should be Col1 a subset of Col2. words are comma seperated.
I found a function which was very handy in the comparism of strings but i could not figure how to implement it to update the table with missing values;
using the DelimitedSplit8K function, found here
http://www.sqlservercentral.com/Forums/Topic1518039-8-1.aspx
DECLARE @String1 varchar(30) = '356,840',
@String2 varchar(30) = '840,356'
IF NOT EXISTS(
SELECT fn.Item from dbo.DelimitedSplit8K(@Stri
EXCEPT
SELECT fn.Item from dbo.DelimitedSplit8K(@Stri
)
PRINT 'No Differences found'
ELSE
PRINT 'they are not the same, regardless of order'
--************************
I ll like to put an update statement in the place of
PRINT 'they are not the same, regardless of order'
update TableB
set Col2 = Col2 + Col1
where 'the camparism above shows a mismatch'
so the result of the select statement will be like this;
select col2 from TableB where id = 1
results;
blue,green,pink,yellow
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I get syntax error:
Incorrect syntax near '.'
at this row:
CROSS APPLY Kepler.dbo.SplitText(A.Col 1, ',') AS SplitA
on further investigation,, i read online that, this error may likely be as a result of;
database not running in Compatibility mode 90 or above
Incorrect syntax near '.'
at this row:
CROSS APPLY Kepler.dbo.SplitText(A.Col
on further investigation,, i read online that, this error may likely be as a result of;
database not running in Compatibility mode 90 or above
ASKER
how can i just select all the IDs of the columns having a mismatch?
You would need to replace the SplitText function with one that you have available.
I'm not sure I understand what you're asking for in the latest comment. Can you give an example?
I'm not sure I understand what you're asking for in the latest comment. Can you give an example?
ASKER
i ll like to output the column Ids of table B which has a mismatch with tableA (i.e records of tableA where col1 in TableA is not a subset of col2 in TableB. result wil be those with id (1 and 3) since they are not subsets of TableB
1
3
1
3
DECLARE @TableA TABLE
(
ID INT IDENTITY(1,1) NOT NULL,
Col1 VARCHAR(MAX) NOT NULL
);
DECLARE @TableB TABLE
(
ID INT IDENTITY(1,1) NOT NULL,
Col2 VARCHAR(MAX) NOT NULL
);
INSERT INTO @TableA (Col1)
VALUES ('blue,green,pink'),
('fat,ugly,stupid,smart'),
('larry,james,philip,john');
INSERT INTO @TableB (Col2)
VALUES ('yellow'),
('fat,krank,ugly,stupid,smart'),
('larry,james,tih,john,ann');
--SELECT * FROM @TableA;
--SELECT * FROM @TableB;
WITH cteTableA AS
(
SELECT A.ID, SplitA.Value
FROM @TableA AS A
CROSS APPLY Kepler.dbo.SplitText(A.Col1, ',') AS SplitA
),
cteTableB AS
(
SELECT B.ID, SplitB.Value
FROM @TableB AS B
CROSS APPLY Kepler.dbo.SplitText(B.Col2, ',') AS SplitB
)
SELECT ID
FROM
(
SELECT ISNULL(A.ID, B.ID) AS ID,
SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) AS AOnlyCount,
SUM(CASE WHEN A.Value IS NULL AND B.Value IS NOT NULL THEN 1 ELSE 0 END) AS BOnlyCount,
SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NOT NULL THEN 1 ELSE 0 END) AS Matches
FROM cteTableA AS A
FULL OUTER JOIN cteTableB AS B
ON A.ID = B.ID
AND A.Value = B.Value
GROUP BY ISNULL(A.ID, B.ID)
) AS Summary
WHERE AOnlyCount > 0 -- Not a subset
On the last query, you don't need a FULL JOIN nor a separate outer query, just add HAVING to the main query:
cteTableB AS
)
...
)
SELECT A.ID,
SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) AS AOnlyCount,
SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NOT NULL THEN 1 ELSE 0 END) AS Matches
FROM cteTableA AS A
LEFT OUTER JOIN cteTableB AS B
ON A.ID = B.ID
AND A.Value = B.Value
GROUP BY A.ID
HAVING SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) > 0
cteTableB AS
)
...
)
SELECT A.ID,
SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) AS AOnlyCount,
SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NOT NULL THEN 1 ELSE 0 END) AS Matches
FROM cteTableA AS A
LEFT OUTER JOIN cteTableB AS B
ON A.ID = B.ID
AND A.Value = B.Value
GROUP BY A.ID
HAVING SUM(CASE WHEN A.Value IS NOT NULL AND B.Value IS NULL THEN 1 ELSE 0 END) > 0
Open in new window
This is saying update the field by joining the comma separated records from each field and get distinct values.