DaveChoiceTech
asked on
sql duplicate rows from concatenated columns
I have a table with 2 columns, code and country. Code is 9 characters. I would like to find all the rows in the table where the first 8 char plus the country are duplicates.
e.g.
123456781 AG
123456782 AG
123456781 BB
987654321 JM
would have concatentated values
12345678AG
12345678AG
12345678BB
98765432JM
which has the duplicate value 12345678AG
so I want to get only the records where the concatenated values are duplicated.
i.e.
123456781 AG
123456782 AG
I have used concatenation only to describe the problem.
I am fine if the solution can select these same record without concatenation.
e.g.
123456781 AG
123456782 AG
123456781 BB
987654321 JM
would have concatentated values
12345678AG
12345678AG
12345678BB
98765432JM
which has the duplicate value 12345678AG
so I want to get only the records where the concatenated values are duplicated.
i.e.
123456781 AG
123456782 AG
I have used concatenation only to describe the problem.
I am fine if the solution can select these same record without concatenation.
More accurate solution could be this one:
MS SQL Server is OK with it. DB2? Who knows?
SELECT * FROM YourTable
WHERE LEFT(code, 8)+country IN (
SELECT LEFT(code, 8)+country
FROM YourTable
GROUP BY LEFT(code, 8)+country
HAVING COUNT(*) > 1)
MS SQL Server is OK with it. DB2? Who knows?
I think DB2 does not use the left function but you should be able to use the substr function in its stead.
select substr(code,1,8), country, count(*)
from yourtable
group by substr(code,1,8), country
having count(*) > 1
select substr(code,1,8), country, count(*)
from yourtable
group by substr(code,1,8), country
having count(*) > 1
I missed where youi wanted to retrieve the complete code column
select y.code, y.country
from yourtable y,
(select substr(code,1,8), country, count(*)
from yourtable
group by substr(code,1,8), country
having count(*) > 1) x
where x.country = y.country
and x.code = substr(y.code,1,8);
select y.code, y.country
from yourtable y,
(select substr(code,1,8), country, count(*)
from yourtable
group by substr(code,1,8), country
having count(*) > 1) x
where x.country = y.country
and x.code = substr(y.code,1,8);
ASKER
Perhaps I wasn't clear. The issue is to find the rows with duplicates of the combined columns LEFT(code,8) and country.
i.e. given
123456781 AG
123456782 AG
123456781 BB
987654321 JM
987654321 BB
987654322 BB
now adding a column for the first 8 digits of the code gives.
12345678 123456781 AG
12345678 123456782 AG
12345678 123456781 BB
98765432 987654321 JM
98765432 987654321 BB
98765432 987654322 BB
Wherever the combined value of the 8 digit code plus the country are duplicated the row from the original table should be provided.
So what I want is
123456781 AG
123456782 AG
987654321 BB
987654322 BB
Note that rows 123456781 BB, 987654321 BB and 987654321 JM are omitted since the first 8 digits of the code combined with the country code are unique.
i.e. given
123456781 AG
123456782 AG
123456781 BB
987654321 JM
987654321 BB
987654322 BB
now adding a column for the first 8 digits of the code gives.
12345678 123456781 AG
12345678 123456782 AG
12345678 123456781 BB
98765432 987654321 JM
98765432 987654321 BB
98765432 987654322 BB
Wherever the combined value of the 8 digit code plus the country are duplicated the row from the original table should be provided.
So what I want is
123456781 AG
123456782 AG
987654321 BB
987654322 BB
Note that rows 123456781 BB, 987654321 BB and 987654321 JM are omitted since the first 8 digits of the code combined with the country code are unique.
ASKER
Oops that final note should have been
Note that rows 123456781 BB and 987654321 JM are omitted since the first 8 digits of the code combined with the country code are unique.
Note that rows 123456781 BB and 987654321 JM are omitted since the first 8 digits of the code combined with the country code are unique.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
What query provided this result:
12345678 123456781 AG
12345678 123456782 AG
12345678 123456781 BB
98765432 987654321 JM
98765432 987654321 BB
98765432 987654322 BB
Both my and awking00's code should work without problems on MS SQL.
12345678 123456781 AG
12345678 123456782 AG
12345678 123456781 BB
98765432 987654321 JM
98765432 987654321 BB
98765432 987654322 BB
Both my and awking00's code should work without problems on MS SQL.
Actually, my code won't work on MS SQL as is, but could be easily modified to do so by using the substring function (i.e. not substr which is DB2) and adding the AS keyword for the alias of the table and subquery.
Open in new window
If you'd like some more reading on GROUP BY I have an image and code-heavy tutorial out there called SQL Server GROUP BY Solutions