Link to home
Start Free TrialLog in
Avatar of Grayson Hampton
Grayson Hampton

asked on

Find Duplicate Values in SQL

I have the following query and I need to alter it so it only returns the USERIDs" that appear more than once. I would like to return a Count along with the max record by date.

SELECT userID, firstName, placementID, dateBegin, 
status, Office, 
              clientCorporationID
FROM (SELECT Candidate.userID, Candidate.firstName, Placement_1.placementID, Placement_1.dateBegin, Placement_1.dateEnd, Placement_1.payRate, Placement_1.status, 
Placement_1.customInt3, 
(SELECT CASE WHEN primaryDeptName = 'HIM - HOU' THEN 'NATIONAL' ELSE CorporateUser_1.office END AS Expr1

FROM CorporateUser
WHERE (userID = CorporateUser_1.userID)) AS Office, JobOrder_4.clientCorporationID


FROM    Placement AS Placement_1 INNER JOIN
                           JobOrder AS JobOrder_4 ON JobOrder_4.jobOrderID = Placement_1.jobOrderID INNER JOIN
                           Candidate ON Placement_1.candidateID = Candidate.userID INNER JOIN
                           vw1stCommissionsRecruiter ON Placement_1.placementID = vw1stCommissionsRecruiter.placementID INNER JOIN
                           PAM00701 AS PAM00701 ON CAST(vw1stCommissionsRecruiter.userID AS varchar(20)) = PAM00701.Consultant_ID INNER JOIN
                           PAM00302 AS PAM00302 ON PAM00701.SBU = PAM00302.SBU INNER JOIN
                           PAM00301 AS PAM00301 ON PAM00701.Office = PAM00301.Office_ID INNER JOIN
                           CorporateUser AS CorporateUser_1 ON vw1stCommissionsRecruiter.userID = CorporateUser_1.userID

WHERE  (CONVERT(DATE, Placement_1.dateBegin, 102) BETWEEN '01/01/2018' AND '05/31/2020') 
AND (NOT (Placement_1.isDeleted = 1))) AS report

WHERE (Office IN ('Chicago'))
ORDER BY userID

Open in new window


The examples in red are duplicates that would appear in the new query.
 







 
userIDfirstNameplacementIDdateBeginstatusOfficeclientCorporationID
1842Michael1481086/11/2018EndedCHICAGO6905
2093Ronald1746664/29/2019EndedCHICAGO373
4000Tim16250212/3/2018EndedCHICAGO12964
4000Tim1428471/29/2018EndedCHICAGO12964
4709Jon1493336/27/2018EndedCHICAGO172124
10893Rich1592929/24/2018EndedCHICAGO14492
16905Eric16054810/18/2018EndedCHICAGO1698
19983Harry16172911/27/2018EndedCHICAGO12741
20917Jeffrey1740515/1/2019EndedCHICAGO88587
21395Sam1457943/29/2018EndedCHICAGO89889
37187Matthew1581349/4/2018EndedCHICAGO154903
37632Krishna1478305/29/2018EndedCHICAGO1458
39375Adeola1486816/11/2018EndedCHICAGO172124
39375Adeola1637631/14/2019EndedCHICAGO25518
39375Adeola1769936/17/2019EndedCHICAGO170614
40715Rashid1634471/7/2019ApprovedCHICAGO171
42459Bhavna1882331/23/2020ApprovedCHICAGO25412
44172Will1462274/16/2018EndedCHICAGO12778
45716Rodney16113511/7/2018EndedCHICAGO86098
48257Victor1478965/24/2018EndedCHICAGO71457
49264Paul1440232/8/2018EndedCHICAGO89889
50277Mike1447313/19/2018EndedCHICAGO89889
52164Theodore1438942/20/2018EndedCHICAGO14801
57228Frank1473815/21/2018EndedCHICAGO172124
57243Joy1925375/1/2020ApprovedCHICAGO6755
59073Taimur1465474/11/2018EndedCHICAGO6905
77068Mitali1458644/9/2018EndedCHICAGO89889
78327Vincent1442622/19/2018EndedCHICAGO106184
83559Chris1434982/7/2018EndedCHICAGO1520
106013Rama1874751/20/2020ApprovedCHICAGO2054
108628Paul1646692/19/2019ApprovedCHICAGO86863
112437Ruth1899533/9/2020ApprovedCHICAGO1458
274598Jessica1644361/29/2019EndedCHICAGO12297
280454Lavette1438232/12/2018EndedCHICAGO13984
280454Lavette18264410/23/2019ApprovedCHICAGO13984
281554Makea18419711/4/2019EndedCHICAGO80913
286131Tony D.1869741/6/2020EndedCHICAGO153313
287444Patricia16089710/26/2018EndedCHICAGO27722
287495Christal1897902/20/2020ApprovedCHICAGO14373
287495Christal1790467/22/2019EndedCHICAGO14373
287566Maria1666523/4/2019EndedCHICAGO13984
287566Maria1777787/1/2019EndedCHICAGO13042
287566Maria16120011/12/2018EndedCHICAGO15202
290584Michael1512598/9/2018EndedCHICAGO87710
291121Veronica1633871/7/2019EndedCHICAGO13042
291121Veronica1575088/22/2018EndedCHICAGO5198
292403Travis1576338/15/2018EndedCHICAGO13769
292403Travis1817549/10/2019EndedCHICAGO1458
292403Travis1753655/6/2019EndedCHICAGO14801
292403Travis1750684/29/2019EndedCHICAGO32266
292403Travis1579378/22/2018EndedCHICAGO13769
292403Travis16101810/29/2018EndedCHICAGO13769
292403Travis16222711/28/2018EndedCHICAGO223583
292403Travis16314012/19/2018EndedCHICAGO13769
292403Travis1636791/9/2019EndedCHICAGO85861
292403Travis1825489/26/2019EndedCHICAGO245381
292403Travis18336910/10/2019EndedCHICAGO1458
292403Travis1512708/7/2018EndedCHICAGO190922
302107Angel1667173/11/2019EndedCHICAGO14287
302107Angel16246212/10/2018EndedCHICAGO2054
306421Regina1452113/13/2018EndedCHICAGO13807
306463Nimesh1482405/29/2018EndedCHICAGO180170
306463Nimesh16158711/12/2018EndedCHICAGO142424
306463Nimesh1590839/18/2018EndedCHICAGO13542
308985Sharia1478285/29/2018EndedCHICAGO81792
310108Marilyn1793927/29/2019EndedCHICAGO12707
310962Mary1817769/3/2019EndedCHICAGO243762
312546Jataun1762656/3/2019EndedCHICAGO15142
316698Dora1798998/6/2019EndedCHICAGO22877
317168Rolonde1726473/13/2019EndedCHICAGO14361
320141Sharon16059710/29/2018EndedCHICAGO2054
321076Nicole1815729/16/2019EndedCHICAGO83481
323827Stanley1781117/2/2019EndedCHICAGO14373
323858Brenda E1896292/18/2020ApprovedCHICAGO15743








ASKER CERTIFIED SOLUTION
Avatar of Pavel Celba
Pavel Celba
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This should return just the userID, count, and max date:
;WITH cte AS 
(
SELECT userID, firstName, placementID, dateBegin, 
status, Office, 
              clientCorporationID
FROM (SELECT Candidate.userID, Candidate.firstName, Placement_1.placementID, Placement_1.dateBegin, Placement_1.dateEnd, Placement_1.payRate, Placement_1.status, 
Placement_1.customInt3, 
(SELECT CASE WHEN primaryDeptName = 'HIM - HOU' THEN 'NATIONAL' ELSE CorporateUser_1.office END AS Expr1
FROM CorporateUser
WHERE (userID = CorporateUser_1.userID)) AS Office, JobOrder_4.clientCorporationID


FROM    Placement AS Placement_1 INNER JOIN
                           JobOrder AS JobOrder_4 ON JobOrder_4.jobOrderID = Placement_1.jobOrderID INNER JOIN
                           Candidate ON Placement_1.candidateID = Candidate.userID INNER JOIN
                           vw1stCommissionsRecruiter ON Placement_1.placementID = vw1stCommissionsRecruiter.placementID INNER JOIN
                           PAM00701 AS PAM00701 ON CAST(vw1stCommissionsRecruiter.userID AS varchar(20)) = PAM00701.Consultant_ID INNER JOIN
                           PAM00302 AS PAM00302 ON PAM00701.SBU = PAM00302.SBU INNER JOIN
                           PAM00301 AS PAM00301 ON PAM00701.Office = PAM00301.Office_ID INNER JOIN
                           CorporateUser AS CorporateUser_1 ON vw1stCommissionsRecruiter.userID = CorporateUser_1.userID

WHERE  (CONVERT(DATE, Placement_1.dateBegin, 102) BETWEEN '01/01/2018' AND '05/31/2020') 
AND (NOT (Placement_1.isDeleted = 1))) AS report
WHERE (Office IN ('Chicago'))
)
SELECT cte.userID, j.cnt, j.maxDate FROM cte 
  JOIN (SELECT userID, COUNT(*) cnt, MAX(dateBegin) maxDate FROM cte GROUP BY userID HAVING COUNT(*) > 1) j
    ON j.userID = cte.userID AND cte.dateBegin = j.maxDate
ORDER BY cte.userID

Open in new window

It seems the previous solution can be simpler:
;WITH cte AS (
SELECT userID, firstName, placementID, dateBegin, status, Office,               clientCorporationID
FROM (SELECT Candidate.userID, Candidate.firstName, Placement_1.placementID, Placement_1.dateBegin, Placement_1.dateEnd, Placement_1.payRate, Placement_1.status, Placement_1.customInt3, (SELECT CASE WHEN primaryDeptName = 'HIM - HOU' THEN 'NATIONAL' ELSE CorporateUser_1.office END AS Expr1
FROM CorporateUser
WHERE (userID = CorporateUser_1.userID)) AS Office, JobOrder_4.clientCorporationID


FROM    Placement AS Placement_1 INNER JOIN
                           JobOrder AS JobOrder_4 ON JobOrder_4.jobOrderID = Placement_1.jobOrderID INNER JOIN
                           Candidate ON Placement_1.candidateID = Candidate.userID INNER JOIN
                           vw1stCommissionsRecruiter ON Placement_1.placementID = vw1stCommissionsRecruiter.placementID INNER JOIN
                           PAM00701 AS PAM00701 ON CAST(vw1stCommissionsRecruiter.userID AS varchar(20)) = PAM00701.Consultant_ID INNER JOIN
                           PAM00302 AS PAM00302 ON PAM00701.SBU = PAM00302.SBU INNER JOIN
                           PAM00301 AS PAM00301 ON PAM00701.Office = PAM00301.Office_ID INNER JOIN
                           CorporateUser AS CorporateUser_1 ON vw1stCommissionsRecruiter.userID = CorporateUser_1.userID

WHERE  (CONVERT(DATE, Placement_1.dateBegin, 102) BETWEEN '01/01/2018' AND '05/31/2020') AND (NOT (Placement_1.isDeleted = 1))) AS report
WHERE (Office IN ('Chicago'))
)
SELECT userID, COUNT(*) cnt, MAX(dateBegin) maxDate FROM cte GROUP BY userID HAVING COUNT(*) > 1
ORDER BY cte.user

Open in new window

Important here is the fact you still may use your original query as a base for cte.
Although I'm not sure where the reference to the row_number() should go, you might try -
SELECT userID, firstName, placementID, dateBegin, 
status, Office, 
              clientCorporationID
FROM (SELECT Candidate.userID, Candidate.firstName, Placement_1.placementID, Placement_1.dateBegin, Placement_1.dateEnd, Placement_1.payRate, Placement_1.status, 
Placement_1.customInt3,
row_number() over (partition by userID, firstName order by Placement_1.dateBegin desc) rn,
 count(*) over (partition by userID, firstName order by Placement_1.dateBegin desc) cnt,
(SELECT CASE WHEN primaryDeptName = 'HIM - HOU' THEN 'NATIONAL' ELSE  CorporateUser_1.office END AS Expr1
 FROM CorporateUser
 WHERE (userID = CorporateUser_1.userID)) AS Office,
 JobOrder_4.clientCorporationID
FROM    Placement AS Placement_1 INNER JOIN
                           JobOrder AS JobOrder_4 ON JobOrder_4.jobOrderID = Placement_1.jobOrderID INNER JOIN
                           Candidate ON Placement_1.candidateID = Candidate.userID INNER JOIN
                           vw1stCommissionsRecruiter ON Placement_1.placementID = vw1stCommissionsRecruiter.placementID INNER JOIN
                           PAM00701 AS PAM00701 ON CAST(vw1stCommissionsRecruiter.userID AS varchar(20)) = PAM00701.Consultant_ID INNER JOIN
                           PAM00302 AS PAM00302 ON PAM00701.SBU = PAM00302.SBU INNER JOIN
                           PAM00301 AS PAM00301 ON PAM00701.Office = PAM00301.Office_ID INNER JOIN
                           CorporateUser AS CorporateUser_1 ON vw1stCommissionsRecruiter.userID = CorporateUser_1.userID
 
WHERE  (CONVERT(DATE, Placement_1.dateBegin, 102) BETWEEN '01/01/2018' AND '05/31/2020') 
AND (NOT (Placement_1.isDeleted = 1))
AND rn = 1) AS report
 
WHERE (Office IN ('Chicago'))
ORDER BY userID

Open in new window

Sorry, I left out the count -
SELECT userID, firstName, placementID, dateBegin, 
status, Office,
              clientCorporationID, cnt
FROM (SELECT ...
Another perhaps simpler query would be
with cte as
(your original query),
cte2 as
(select cte.*, count(*) over (partition by userid order by office) as dupe_count,
 row_number() over (partition by userid order by date_begin desc) as rn
 from cte)
SELECT userID, firstName, placementID, dateBegin, status, Office, clientCorporationID, dupe_count
from cte2 where rn = 1;