johnnyg123
asked on
Find set of duplicates within columns in same row and across rows
Here is some sample data for a sql server 2014 table
CREATE TABLE #License(
[customer_id] [int] NULL,
[license_key1] [nchar](10) NULL,
[license_state1] [nchar](10) NULL,
[license_key2] [nchar](10) NULL,
[license_state2] [nchar](10) NULL,
[license_key3] [nchar](10) NULL,
[license_state3] [nchar](10) NULL
) ON [PRIMARY]
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(11111111
,12345
,'IL'
,34567
,'IN'
,12345
,'IL')
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(22222222
,66666
,'MI'
,999999
,'NY'
,87654
,'CO')
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(33333333
,908763
,'MI'
,22263
,'NY'
,135798
,'IL')
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(44444444
,98765
,'MI'
,4576
,'NY'
,135798
,'IL')
As you can see there are 3 sets of license key and state combinations
I want one query that will return any rows that have a duplicate set in the 3 sets across row along with the combination that is the duplicate
Another query to identify a match in set values in any of the sets in any rows along with the combination that is the duplicate
(Note: it is possible that the license key could be alpha or alpha numeric)
Given above data
The query looking at combinations across columns in a row should return
customer_id licensekey licensestate
11111111 12345 IL
The query looking at combinations across rows should return
customer_id licensekey licensestate
33333333 135798 IL
44444444 135798 IL
CREATE TABLE #License(
[customer_id] [int] NULL,
[license_key1] [nchar](10) NULL,
[license_state1] [nchar](10) NULL,
[license_key2] [nchar](10) NULL,
[license_state2] [nchar](10) NULL,
[license_key3] [nchar](10) NULL,
[license_state3] [nchar](10) NULL
) ON [PRIMARY]
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(11111111
,12345
,'IL'
,34567
,'IN'
,12345
,'IL')
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(22222222
,66666
,'MI'
,999999
,'NY'
,87654
,'CO')
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(33333333
,908763
,'MI'
,22263
,'NY'
,135798
,'IL')
INSERT INTO #License
([customer_id]
,[license_key1]
,[license_state1]
,[license_key2]
,[license_state2]
,[license_key3]
,[license_state3])
VALUES
(44444444
,98765
,'MI'
,4576
,'NY'
,135798
,'IL')
As you can see there are 3 sets of license key and state combinations
I want one query that will return any rows that have a duplicate set in the 3 sets across row along with the combination that is the duplicate
Another query to identify a match in set values in any of the sets in any rows along with the combination that is the duplicate
(Note: it is possible that the license key could be alpha or alpha numeric)
Given above data
The query looking at combinations across columns in a row should return
customer_id licensekey licensestate
11111111 12345 IL
The query looking at combinations across rows should return
customer_id licensekey licensestate
33333333 135798 IL
44444444 135798 IL
ASKER
Hi Tony
Thanks so much for your response!
I really so not want to sound unappreciative because I am
Query 1 works just fine
However
Query 2 assumes the duplicates are in the same license_key and license_state group which is not the case
For example,
customer_id license_key1 license_state1 license_key2 license_state2 license_key3 license_state3
111111111 12345 IL 67897 NY 77777 IL
222222222 12345 IL 6789 NY 777777 IL
The duplicate would be found across rows because for both customer ids the dups are in license_key1 and license_state1
However
The query would not find the dup given the following data
customer_id license_key1 license_state1 license_key2 license_state2 license_key3 license_state3
111111111 12345 IL 67897 NY 77777 IL
222222222 44444 IL 12345 IL 88888 IL
The duplicate set of data is in license_key1 and license_state1 for customer id 111111111
The duplicate set of data is in license_key2 and license_state2 for customer id 222222222
Also, I only included 3 groups in the sample data to try and make it a little cleaner. However, there are actually 5 groups
Trying to account for all the combinations in the first where clause is a bit confusing
Thanks so much for your response!
I really so not want to sound unappreciative because I am
Query 1 works just fine
However
Query 2 assumes the duplicates are in the same license_key and license_state group which is not the case
For example,
customer_id license_key1 license_state1 license_key2 license_state2 license_key3 license_state3
111111111 12345 IL 67897 NY 77777 IL
222222222 12345 IL 6789 NY 777777 IL
The duplicate would be found across rows because for both customer ids the dups are in license_key1 and license_state1
However
The query would not find the dup given the following data
customer_id license_key1 license_state1 license_key2 license_state2 license_key3 license_state3
111111111 12345 IL 67897 NY 77777 IL
222222222 44444 IL 12345 IL 88888 IL
The duplicate set of data is in license_key1 and license_state1 for customer id 111111111
The duplicate set of data is in license_key2 and license_state2 for customer id 222222222
Also, I only included 3 groups in the sample data to try and make it a little cleaner. However, there are actually 5 groups
Trying to account for all the combinations in the first where clause is a bit confusing
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks for all the responses!
Tony,
Thanks so much for taking the time for your response
Hi Scott
Couldn't agree more about normalizing the vendor file
We are trying to help them figure out some issues so for now kinda stuck
Thanks so much for the solution!
Tony,
Thanks so much for taking the time for your response
Hi Scott
Couldn't agree more about normalizing the vendor file
We are trying to help them figure out some issues so for now kinda stuck
Thanks so much for the solution!
ASKER
Hi Scott
Couldn't agree more about normalizing the vendor file
We are trying to help them figure out some issues so for now kinda stuck
Thanks so much for the solution!
Couldn't agree more about normalizing the vendor file
We are trying to help them figure out some issues so for now kinda stuck
Thanks so much for the solution!
I have 2 queries below. It would be good to test them over more data than just the 4 rows of license data provided.
Have a go and see if it is close.
Thanks
Tony
Open in new window