Link to home
Start Free TrialLog in
Avatar of rporter45
rporter45

asked on

SQL: Identifying Duplicates In A Sub-Query

SQL:  In an SQL sub-query, the DISTINCT clause is ineffective.  How do I eliminate duplicates from my returns.  eg. 3 FirstName, LastName fields contain 'Charles Smith', but I only want 1 to return.

Avatar of Aneesh
Aneesh
Flag of Canada image

Can we know the query you are trying
select firstname, lastname
from table
group by firstname, lastname
Avatar of rporter45
rporter45

ASKER

hi aneeshattingal

here's an example of a query i'm manipulating to try to return results without dupluctates.
I am trying to utilize the COUNT function as this seems the most efficient and reliable way of doing it.

SELECT [dbo].[SmallDupe].* FROM [dbo].[SmallDupe]
 inner JOIN

 (SELECT [ConstituentID],[NewTransaction], [LastName], COUNT(*)
 ,MAX([TransactionID]) as TransactionID
 ,MAX(COALESCE([PaymentDate],[RegistrationDate], cast ('12/31/2051' as
 datetime))) as Date

 FROM [dbo].[SmallDupe]
 GROUP BY ConstituentID, NewTransaction, LastName) as A
 ON A.ConstituentID = SmallDupe.ConstituentID
 AND A.TransactionID = SmallDupe.TransactionID
 AND A.Date = (COALESCE([PaymentDate],[RegistrationDate],cast('12/31/2051'
 as datetime)))
 AND A.NewTransaction = SmallDupe.NewTransaction

 WHERE A.NewTransaction <> 'No'

 HAVING COUNT(*) > 1

 ORDER BY LastName
There is no group by clause used in the query.
Please try using group by clause and see if that helps you.

Hope this helps

Aash.
Looking at the subquery, you are right, it should return only 1 record for
(ConstituentID, NewTransaction, LastName) even there is no firstname as you mentioned.

This group by is correct and because the distinct is not helping you either, are you sure these names are exactly the same ? Maybe give it a try with rtrim(lastname) as that field is a char() in stead of a varchar() ... ?

When the problem reside in the join clause between the inner and outer query, are you sure all relevant fields are mentioned in the join clauses ? Maybe the lastname should be in there as well ?

If you can give some more info on the table and/or the data, we might be able to crack this one ...

Hope this helps ...
SQL

The data below is an example of duplicated fields in a database.
Following that is the SQL query I'm working with.
I'm looking to eliminate the duplicated records.
The query is only partially accurate.
Rows in which more than one column contains duplicates are not removed.
The criteria for duplication is only 1 PersonID with max SaleID and max (date1 or date2)


PersonID, SaleID, Date1, Date2

1, 100, 2006-08-10T07:35:29.023, 2006-08-11T07:35:29.023
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
3, 103, , 2006-08-10T07:35:29.023
4, 104,2006-08-10T07:35:35.023,2006-08-10T07:35:29.023
4, 105, ,2006-08-10T07:35:29.023



SELECT [dbo].[DupData].* FROM [dbo].[DupData]
 inner JOIN
 (SELECT [PersonID],[SaleID]
 ,MAX([SaleID]) as SaleID
 ,MAX(COALESCE([Date1],[Date2], cast ('12/31/2051' as
 datetime))) as Date
 FROM [dbo].[DupData]
 GROUP BY PersonID, SaleID) as A
 ON A.PersonID = DupData.PersonID
 AND A.SaleID = DupData.SaleID
 AND A.Date = (COALESCE([Date1],[Date2],cast('12/31/2051'
 as datetime)))
 AND A.SaleID = DupData.SaleID
 ORDER BY PersonID, SaleID


The returns from the query follows:

PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023



Without the date criteria, the query returned:

PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
3, 103, , 2006-08-10T07:35:29.023
4, 105, ,2006-08-10T07:35:29.023




... echo ???
I was struggling for you with this one: http:/Q_22935635.html

Can you please give the output as you would like to have it, or describe it more clearly so we can test the code before we start flooding you with alternatives of the code ?

Hope this helps ...
Hello Yveau.
Thankyou.  This is what should be returned:

PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
4, 105, ,2006-08-10T07:35:29.023

I need the highest SaleID with the latest date of either Date 1 or 2
I need all the fields returned so that they can be manipulated further.




ASKER CERTIFIED SOLUTION
Avatar of Yveau
Yveau
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
note that hte datediff is in seconds, so any time difference smaller than that will not be noticed by the 'function' and a time difference of > 2,147,483,647 seconds (just over 68 years) will cause an overflow ...

Hope this helps ...
As you accepted my answer for Q http:/Q_22935635.html, could you please also close this Q ?

Thanks,
Yveau
Glad I could be of any help and thanks for the grade !