rporter45
asked on
SQL: Identifying Duplicates In A Sub-Query
SQL: In an SQL sub-query, the DISTINCT clause is ineffective. How do I eliminate duplicates from my returns. eg. 3 FirstName, LastName fields contain 'Charles Smith', but I only want 1 to return.
Can we know the query you are trying
select firstname, lastname
from table
group by firstname, lastname
from table
group by firstname, lastname
ASKER
hi aneeshattingal
here's an example of a query i'm manipulating to try to return results without dupluctates.
I am trying to utilize the COUNT function as this seems the most efficient and reliable way of doing it.
SELECT [dbo].[SmallDupe].* FROM [dbo].[SmallDupe]
inner JOIN
(SELECT [ConstituentID],[NewTransa ction], [LastName], COUNT(*)
,MAX([TransactionID]) as TransactionID
,MAX(COALESCE([PaymentDate ],[Registr ationDate] , cast ('12/31/2051' as
datetime))) as Date
FROM [dbo].[SmallDupe]
GROUP BY ConstituentID, NewTransaction, LastName) as A
ON A.ConstituentID = SmallDupe.ConstituentID
AND A.TransactionID = SmallDupe.TransactionID
AND A.Date = (COALESCE([PaymentDate],[R egistratio nDate],cas t('12/31/2 051'
as datetime)))
AND A.NewTransaction = SmallDupe.NewTransaction
WHERE A.NewTransaction <> 'No'
HAVING COUNT(*) > 1
ORDER BY LastName
here's an example of a query i'm manipulating to try to return results without dupluctates.
I am trying to utilize the COUNT function as this seems the most efficient and reliable way of doing it.
SELECT [dbo].[SmallDupe].* FROM [dbo].[SmallDupe]
inner JOIN
(SELECT [ConstituentID],[NewTransa
,MAX([TransactionID]) as TransactionID
,MAX(COALESCE([PaymentDate
datetime))) as Date
FROM [dbo].[SmallDupe]
GROUP BY ConstituentID, NewTransaction, LastName) as A
ON A.ConstituentID = SmallDupe.ConstituentID
AND A.TransactionID = SmallDupe.TransactionID
AND A.Date = (COALESCE([PaymentDate],[R
as datetime)))
AND A.NewTransaction = SmallDupe.NewTransaction
WHERE A.NewTransaction <> 'No'
HAVING COUNT(*) > 1
ORDER BY LastName
There is no group by clause used in the query.
Please try using group by clause and see if that helps you.
Hope this helps
Aash.
Please try using group by clause and see if that helps you.
Hope this helps
Aash.
Looking at the subquery, you are right, it should return only 1 record for
(ConstituentID, NewTransaction, LastName) even there is no firstname as you mentioned.
This group by is correct and because the distinct is not helping you either, are you sure these names are exactly the same ? Maybe give it a try with rtrim(lastname) as that field is a char() in stead of a varchar() ... ?
When the problem reside in the join clause between the inner and outer query, are you sure all relevant fields are mentioned in the join clauses ? Maybe the lastname should be in there as well ?
If you can give some more info on the table and/or the data, we might be able to crack this one ...
Hope this helps ...
(ConstituentID, NewTransaction, LastName) even there is no firstname as you mentioned.
This group by is correct and because the distinct is not helping you either, are you sure these names are exactly the same ? Maybe give it a try with rtrim(lastname) as that field is a char() in stead of a varchar() ... ?
When the problem reside in the join clause between the inner and outer query, are you sure all relevant fields are mentioned in the join clauses ? Maybe the lastname should be in there as well ?
If you can give some more info on the table and/or the data, we might be able to crack this one ...
Hope this helps ...
ASKER
SQL
The data below is an example of duplicated fields in a database.
Following that is the SQL query I'm working with.
I'm looking to eliminate the duplicated records.
The query is only partially accurate.
Rows in which more than one column contains duplicates are not removed.
The criteria for duplication is only 1 PersonID with max SaleID and max (date1 or date2)
PersonID, SaleID, Date1, Date2
1, 100, 2006-08-10T07:35:29.023, 2006-08-11T07:35:29.023
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
3, 103, , 2006-08-10T07:35:29.023
4, 104,2006-08-10T07:35:35.02 3,2006-08- 10T07:35:2 9.023
4, 105, ,2006-08-10T07:35:29.023
SELECT [dbo].[DupData].* FROM [dbo].[DupData]
inner JOIN
(SELECT [PersonID],[SaleID]
,MAX([SaleID]) as SaleID
,MAX(COALESCE([Date1],[Dat e2], cast ('12/31/2051' as
datetime))) as Date
FROM [dbo].[DupData]
GROUP BY PersonID, SaleID) as A
ON A.PersonID = DupData.PersonID
AND A.SaleID = DupData.SaleID
AND A.Date = (COALESCE([Date1],[Date2], cast('12/3 1/2051'
as datetime)))
AND A.SaleID = DupData.SaleID
ORDER BY PersonID, SaleID
The returns from the query follows:
PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
Without the date criteria, the query returned:
PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
3, 103, , 2006-08-10T07:35:29.023
4, 105, ,2006-08-10T07:35:29.023
The data below is an example of duplicated fields in a database.
Following that is the SQL query I'm working with.
I'm looking to eliminate the duplicated records.
The query is only partially accurate.
Rows in which more than one column contains duplicates are not removed.
The criteria for duplication is only 1 PersonID with max SaleID and max (date1 or date2)
PersonID, SaleID, Date1, Date2
1, 100, 2006-08-10T07:35:29.023, 2006-08-11T07:35:29.023
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
3, 103, , 2006-08-10T07:35:29.023
4, 104,2006-08-10T07:35:35.02
4, 105, ,2006-08-10T07:35:29.023
SELECT [dbo].[DupData].* FROM [dbo].[DupData]
inner JOIN
(SELECT [PersonID],[SaleID]
,MAX([SaleID]) as SaleID
,MAX(COALESCE([Date1],[Dat
datetime))) as Date
FROM [dbo].[DupData]
GROUP BY PersonID, SaleID) as A
ON A.PersonID = DupData.PersonID
AND A.SaleID = DupData.SaleID
AND A.Date = (COALESCE([Date1],[Date2],
as datetime)))
AND A.SaleID = DupData.SaleID
ORDER BY PersonID, SaleID
The returns from the query follows:
PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
Without the date criteria, the query returned:
PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
3, 103, , 2006-08-10T07:35:29.023
4, 105, ,2006-08-10T07:35:29.023
... echo ???
I was struggling for you with this one: http:/Q_22935635.html
Can you please give the output as you would like to have it, or describe it more clearly so we can test the code before we start flooding you with alternatives of the code ?
Hope this helps ...
I was struggling for you with this one: http:/Q_22935635.html
Can you please give the output as you would like to have it, or describe it more clearly so we can test the code before we start flooding you with alternatives of the code ?
Hope this helps ...
ASKER
Hello Yveau.
Thankyou. This is what should be returned:
PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
4, 105, ,2006-08-10T07:35:29.023
I need the highest SaleID with the latest date of either Date 1 or 2
I need all the fields returned so that they can be manipulated further.
Thankyou. This is what should be returned:
PersonID, SaleID, Date1, Date2
1, 102, 2006-08-12T07:35:29.023, 2006-08-12T08:35:29.023
2, 101, 2006-08-10T07:35:29.023, 2006-08-10T07:35:29.023
3, 103, 2006-08-10T07:35:29.023, 2006-08-10T07:35:35.023
4, 105, ,2006-08-10T07:35:29.023
I need the highest SaleID with the latest date of either Date 1 or 2
I need all the fields returned so that they can be manipulated further.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
note that hte datediff is in seconds, so any time difference smaller than that will not be noticed by the 'function' and a time difference of > 2,147,483,647 seconds (just over 68 years) will cause an overflow ...
Hope this helps ...
Hope this helps ...
As you accepted my answer for Q http:/Q_22935635.html, could you please also close this Q ?
Thanks,
Yveau
Thanks,
Yveau
Glad I could be of any help and thanks for the grade !