AJS_Developer
asked on
Use SQL to find unique sets
In a SQL Server 2005 database there is a table called Table_1. It's columns are ID, SetName, and SetValue. I'm after an SQL statement (without the help of iterating with a cursor) that will return all unique combinations of values in SetValue.
For example:
Running the statement against this data:
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 A
7 3 B
8 3 C
would return
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
All rows with SetName 3 would be left out, because really set 3 is just a duplicate of set 1.
Thanks in advance!
For example:
Running the statement against this data:
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 A
7 3 B
8 3 C
would return
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
All rows with SetName 3 would be left out, because really set 3 is just a duplicate of set 1.
Thanks in advance!
sorry
try this one
with a as (
select t1.id, t1.setName, t1.value,
sum (case when t1.setName is not null and t2.setName is null then 1 else 0 end) over(partition by t1.setName) as only_in_set1
sum (case when t2.setName is not null and t1.setName is null then 1 else 0 end) over(partition by t1.setName) as only_in_set2
from yourtable t1 full outer join yourtable t2
on t1.setName < t2.setName
and t1.value = t2.value
)
select id,setName,value
from a t3
where only_in_set1 = 0
and only_in_set2 = 0
try this one
with a as (
select t1.id, t1.setName, t1.value,
sum (case when t1.setName is not null and t2.setName is null then 1 else 0 end) over(partition by t1.setName) as only_in_set1
sum (case when t2.setName is not null and t1.setName is null then 1 else 0 end) over(partition by t1.setName) as only_in_set2
from yourtable t1 full outer join yourtable t2
on t1.setName < t2.setName
and t1.value = t2.value
)
select id,setName,value
from a t3
where only_in_set1 = 0
and only_in_set2 = 0
have a look.
select distinct t1.id,t1.setname,t1.setval ue from tab1 t1 full outer join tab1 t2
on t1.setName
select distinct t1.id,t1.setname,t1.setval
on t1.setName
BTW, the table and data I have used in above query is like:
create table tab1
(
id int ,
setname int,
setvalue varchar(1)
)
insert into tab1
select 1, 1, 'A' union all
select 2 , 1, 'B' union all
select 3 , 1, 'C' union all
select 4 , 2, 'A' union all
select 5 , 2, 'B' union all
select 6, 3, 'A' union all
select 7, 3, 'B' union all
select 8 , 3, 'C'
create table tab1
(
id int ,
setname int,
setvalue varchar(1)
)
insert into tab1
select 1, 1, 'A' union all
select 2 , 1, 'B' union all
select 3 , 1, 'C' union all
select 4 , 2, 'A' union all
select 5 , 2, 'B' union all
select 6, 3, 'A' union all
select 7, 3, 'B' union all
select 8 , 3, 'C'
ASKER
Thanks RiteshShah!
I made some small changes to your statement (see attached) and I get the correct result.
However (this may be because I didn't phrase the question correctly) if I use the following test data:
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 D
7 3 E
8 3 F
then what I would expect is
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 D
7 3 E
8 3 F
because in this case each of the 3 sets are unique, but you're statement returns
1 1 A
2 1 B
I made some small changes to your statement (see attached) and I get the correct result.
However (this may be because I didn't phrase the question correctly) if I use the following test data:
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 D
7 3 E
8 3 F
then what I would expect is
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 D
7 3 E
8 3 F
because in this case each of the 3 sets are unique, but you're statement returns
1 1 A
2 1 B
Select distinct t1.id,
t1.setname,
t1.setvalue
from table_1 t1 full outer join table_1 t2
on t1.setName <t2.setName
and t1.setvalue = t2.setvalue
where t2.id is not null and t1.id is not null order by t1.id
ASKER
momi_sabaq,
Using your second suggestion (which I made some syntax changes to, code attached) I get the following incorrect result:
1 1 A
1 1 A
2 1 B
2 1 B
3 1 C
4 2 A
5 2 B
Using your second suggestion (which I made some syntax changes to, code attached) I get the following incorrect result:
1 1 A
1 1 A
2 1 B
2 1 B
3 1 C
4 2 A
5 2 B
with a as (
select t1.id, t1.setName, t1.Setvalue,
sum (case when t1.setName is not null and t2.setName is null then 1 else 0 end) over(partition by t1.setName) as only_in_set1,
sum (case when t2.setName is not null and t1.setName is null then 1 else 0 end) over(partition by t1.setName) as only_in_set2
from table_1 t1 full outer join table_1 t2
on t1.setName < t2.setName
and t1.Setvalue = t2.Setvalue
)
select id,
setName,
Setvalue
from a t3
where only_in_set1 = 0
and only_in_set2 = 0
ASKER
momi_sabaq,
Using your first suggestion (which I also made some changes to, code attached) gave me the same incorrect result
1 1 A
1 1 A
2 1 B
2 1 B
3 1 C
4 2 A
5 2 B
Using your first suggestion (which I also made some changes to, code attached) gave me the same incorrect result
1 1 A
1 1 A
2 1 B
2 1 B
3 1 C
4 2 A
5 2 B
with a as (
select t1.id, t1.setName, t1.setvalue,
count(*) over(partition by t1.setName) as set_count,
sum (case when t2.setName is not null then 1 else 0 end) over(partition by t1.setName) as set2_count
from table_1 t1 left outer join table_1 t2
on t1.setName < t2.setName
and t1.setvalue = t2.setvalue
)
select id,setName,setvalue
from a t3
where set_count = set2_count
are you still looking for the solution?
ASKER
As a solution to this problem I'd also accept a proof as to why it is impossible to accomplish without using a cursor. Thanks for all your help so far.
If you have the data like this, your query will not work. let me know if you still looking for solution.
select * from #temp
id setname setvalue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
8 2 C
6 3 A
7 3 B
8 3 C
Select distinct t1.id, t1.setname, t1.setvalue
from #temp t1 full outer join #temp t2
on t1.setName <t2.setName
and t1.setvalue = t2.setvalue
where t2.id is not null and t1.id is not null order by t1.id
id setname setvalue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
8 2 C
ASKER
See code attached for a possible solution. Here are my test results (Have I missed a special case somewhere?)
Case 1: (3 sets total, 3rd is duplicate of first)
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 A
7 3 B
8 3 C
returns
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
Case 2: (3 sets total, all the same)
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 A
7 3 B
8 3 C
9 2 C
returns
1 1 A
2 1 B
3 1 C
Case 4: (5 sets total, 1st and 4th are the same, 3rd and 5th are the same, 2nd unique, sets 1, 3, 4, 5 share at least one common element)
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 C
7 3 D
8 3 E
9 4 A
10 4 B
11 4 C
12 5 C
13 5 D
14 5 E
returns
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 C
7 3 D
8 3 E
Thanks for all the suggestions!
Case 1: (3 sets total, 3rd is duplicate of first)
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 A
7 3 B
8 3 C
returns
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
Case 2: (3 sets total, all the same)
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 A
7 3 B
8 3 C
9 2 C
returns
1 1 A
2 1 B
3 1 C
Case 4: (5 sets total, 1st and 4th are the same, 3rd and 5th are the same, 2nd unique, sets 1, 3, 4, 5 share at least one common element)
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 C
7 3 D
8 3 E
9 4 A
10 4 B
11 4 C
12 5 C
13 5 D
14 5 E
returns
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 C
7 3 D
8 3 E
Thanks for all the suggestions!
WITH t2 AS (
SELECT t1.ID,
t1.SetName,
t1.SetValue,
COUNT(*) OVER(PARTITION BY t1.SetName) AS counter
FROM Table_1 t1
)
SELECT *
FROM Table_1 t7
WHERE t7.SetName NOT IN (
SELECT DISTINCT t3_SetName AS Duplicate_SetName
FROM (
SELECT *,
COUNT(*) OVER(PARTITION BY t3_SetName, t4_SetName) AS counter
FROM (
SELECT t3.ID AS t3_ID,
t3.SetName AS t3_SetName,
t3.SetValue AS t3_SetValue,
t3.counter AS t3_counter,
t4.ID AS t4_ID,
t4.SetName AS t4_SetName,
t4.SetValue AS t4_SetValue,
t4.counter AS t4_counter
FROM t2 t3 LEFT JOIN t2 t4
ON t4.counter = t3.counter
AND t4.SetValue = t3.SetValue
AND t4.SetName < t3.SetName
) t5
) t6
WHERE counter = t3_counter
AND NOT t4_ID IS NULL
)
ASKER
Hi All,
I couldn't help myself - I've tidied up the code a bit so that it should be easier to transpose to whatever table you are trying to apply this too. Now you can just substitute field / table names in the top WITH clause to get going.
I'm not sure how practical this is going to be either - running on real world data seems to be taking a LONG time.
I couldn't help myself - I've tidied up the code a bit so that it should be easier to transpose to whatever table you are trying to apply this too. Now you can just substitute field / table names in the top WITH clause to get going.
I'm not sure how practical this is going to be either - running on real world data seems to be taking a LONG time.
WITH t2 AS (
SELECT t1.ID,
t1.SetName,
t1.SetValue,
COUNT(*) OVER(PARTITION BY t1.SetName) AS counter
FROM Table_1 t1
)
SELECT ID,
SetName,
SetValue
FROM t2 t7
WHERE t7.SetName NOT IN (
SELECT DISTINCT t3_SetName AS Duplicate_SetName
FROM (
SELECT *,
COUNT(*) OVER(PARTITION BY t3_SetName, t4_SetName) AS counter
FROM (
SELECT t3.SetName AS t3_SetName,
t3.counter AS t3_counter,
t4.ID AS t4_ID,
t4.SetName AS t4_SetName
FROM t2 t3 LEFT JOIN t2 t4
ON t4.counter = t3.counter
AND t4.SetValue = t3.SetValue
AND t4.SetName < t3.SetName
) t5
) t6
WHERE counter = t3_counter
AND NOT t4_ID IS NULL
)
I tried like this. also tested with the below examples and working fine.
select t4.*
from Table_1 t4
join (select min(SetName) as SetName
from (select distinct SetName,
(select rtrim(substring(isnull((se
from Table_1 t1 where t1.SetName = t2.SetName for xml path('')),''),2,2000))) Grp
from Table_1 t2) t3 group by Grp)t5
on t4.SetName = t5.SetName
order by t4.id
create table #temp(id int,SetName int,SetValue varchar(2))
insert into #temp values (1,1,'A'),(2,1,'B'),(3,1,'C'),(4,2,'A'),(5,2,'B'),(6,3,'A'),(7,3,'B'),(8,3,'C')
select * from #temp
id SetName SetValue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 A
7 3 B
8 3 C
select t4.*
from #temp t4
join (select min(SetName) as SetName
from (select distinct SetName,
(select rtrim(substring(isnull((select top 5 ','+SetValue
from #temp t1 where t1.SetName = t2.SetName for xml path('')),''),2,2000))) Grp
from #temp t2) t3 group by Grp)t5
on t4.SetName = t5.SetName
order by t4.id
id SetName SetValue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
drop table #temp
go
--------------------------------------------------------------------------------------------------------
create table #temp(id int,SetName int,SetValue varchar(2))
insert into #temp values (1,1,'A'),(2,1,'B'),(3,1,'C'),(4,2,'A'),(5,2,'B'),(6,3,'D'),(7,3,'E'),(8,3,'F')
select * from #temp
id SetName SetValue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 D
7 3 E
8 3 F
select t4.*
from #temp t4
join (select min(SetName) as SetName
from (select distinct SetName,
(select rtrim(substring(isnull((select top 5 ','+SetValue
from #temp t1 where t1.SetName = t2.SetName for xml path('')),''),2,2000))) Grp
from #temp t2) t3 group by Grp)t5
on t4.SetName = t5.SetName
order by t4.id
id SetName SetValue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 D
7 3 E
8 3 F
drop table #temp
go
--------------------------------------------------------------------------------------------------------
create table #temp(id int,SetName int,SetValue varchar(2))
insert into #temp values (1,1,'A'),(2,1,'B'),(3,1,'C'),(4,2,'A'),(5,2,'B'),(6,2,'C'),(7,3,'A'),(8,3,'B'),(9,3,'C')
select * from #temp
id SetName SetValue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 2 C
7 3 A
8 3 B
9 3 C
select t4.*
from #temp t4
join (select min(SetName) as SetName
from (select distinct SetName,
(select rtrim(substring(isnull((select top 5 ','+SetValue
from #temp t1 where t1.SetName = t2.SetName for xml path('')),''),2,2000))) Grp
from #temp t2) t3 group by Grp)t5
on t4.SetName = t5.SetName
order by t4.id
id SetName SetValue
1 1 A
2 1 B
3 1 C
drop table #temp
go
if you have data like the attached sample set, my above query won't work. An ORDER BY clause is required in this case.
create table #temp(id int,SetName int,SetValue varchar(2))
insert into #temp values (1,1,'A'),(2,1,'B'),(3,1,'C'),(4,2,'A'),(5,2,'B'),(6,3,'C'),(7,3,'B'),(8,3,'A')
select * from #temp
id SetName SetValue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
6 3 C
7 3 B
8 3 A
select t4.*
from #temp t4
join (select min(SetName) as SetName
from (select distinct SetName,
(select rtrim(substring(isnull((select top 5 ','+SetValue
from #temp t1 where t1.SetName = t2.SetName order by t1.SetValue for xml path('')),''),2,2000))) Grp
from #temp t2) t3 group by Grp)t5
on t4.SetName = t5.SetName
order by t4.id
id SetName SetValue
1 1 A
2 1 B
3 1 C
4 2 A
5 2 B
drop table #temp
go
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Sharath_123,
Your solution is brilliant. Using FOR XML as a concatenation technique proved the fastest solution.
On a sample of 5,000 records, with SetValue's of roughly 15 - 20 characters, your script performed consistently at 5 to 8 seconds, ahead of my solution which took 30 - 40 seconds, and a solution that used temp tables to build up a unique string per set that took hours.
Your solution is brilliant. Using FOR XML as a concatenation technique proved the fastest solution.
On a sample of 5,000 records, with SetValue's of roughly 15 - 20 characters, your script performed consistently at 5 to 8 seconds, ahead of my solution which took 30 - 40 seconds, and a solution that used temp tables to build up a unique string per set that took hours.
Thanks AJS_Developer, Glad to help you.
with a as (
select t1.id, t1.setName, t1.value,
count(*) over(partition by t1.setName) as set_count
sum (case when t2.setName is not null then 1 else 0 end) over(partition by t1.setName) as set2_count
from yourtable t1 left outer join yourtable t2
on t1.setName < t2.setName
and t1.value = t2.value
)
select id,setName,value
from a t3
where set_count = set2_count