• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 544
  • Last Modified:

Count number of duplicate records

I am trying to get a count of the number of duplicate records from a table based on three fields, Fname, Lname, and socsec number.  I have this query but it isn't working.  

Can you tell me the proper syntax to get a result and possible "clean up" my logic?

SELECT Count(*)
FROM TPatient
WHERE (((TPatient.FirstName) In (SELECT [FirstName] FROM [TPatient] As Tmp GROUP BY [FirstName],[LastName],[SocSec] HAVING Count(*)>1  And [LastName] = [TPatient].[LastName] And [SocSec] = [TPatient].[SocSec])));


Thank you.
0
thandel
Asked:
thandel
  • 10
  • 7
  • 5
  • +1
1 Solution
 
ldunscombeCommented:
SELECT TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC, Count([FNAME] & [LNAME] & [SOCSEC]) AS Counter
FROM TPatient
GROUP BY TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC;

Leigh
0
 
DatabaseMX (Joe Anderson - Microsoft MVP, Access and Data Platform)Commented:
Try this SQL:

SELECT Sum([%$##@_Alias].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY [FirstName],[LastName],[SocSec]
HAVING (((Count(TPatient.FirstName))>1))]. AS [%$##@_Alias];
0
 
thandelAuthor Commented:
Hmmm I gave you the wrong field names but when I use the code I'm getting names instead of a value.  Here is what I tried:

SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS Counter
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec;
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
thandelAuthor Commented:
DatabaseMX.... what is [%$##@_Alias]?

I tried that solution and got a error for invavlid braket name.
0
 
DatabaseMX (Joe Anderson - Microsoft MVP, Access and Data Platform)Commented:
Sorry, try this(I'll explain later) ...


SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatient.LastName,TPatient.SocSec
HAVING (((Count(TPatient.FirstName))>1))]. AS [DupSQL];
0
 
thandelAuthor Commented:
That solution returned a value of 6, however there are only two records in the DB that are duplicates.  So my test case should have a result of 2 not 6.
0
 
ldunscombeCommented:
thandel,

Do you not get a "count" field in addition to the names with my solution. ???
0
 
thandelAuthor Commented:
I do when used as a query, however I am trying to simply use this as data in list box only as the data sort.  Sorry I didn't think it made a difference where I used a query.
0
 
DatabaseMX (Joe Anderson - Microsoft MVP, Access and Data Platform)Commented:
" So my test case should have a result of 2 not 6."

Using which solution ?

The SQL I posted is basically Summing the SQL of the query that does the dup counting ... and gives the total.

mx
0
 
ldunscombeCommented:
You can, Just add the order by clause as below.
SELECT TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC, Count([FNAME] & [LNAME] & [SOCSEC]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC
ORDER BY Count([FNAME] & [LNAME] & [SOCSEC]) DESC;
0
 
thandelAuthor Commented:
DatabaseMX: Using this solution:

SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatient.LastName,TPatient.SocSec
HAVING (((Count(TPatient.FirstName))>1))]. AS [DupSQL];
0
 
thandelAuthor Commented:
ldunscombe, thank you but I don't need a single value in an order so I would rather not use the overhead to sort a single value.
0
 
DatabaseMX (Joe Anderson - Microsoft MVP, Access and Data Platform)Commented:
OK ... well, I don't have a WHERE clause ... you probably need to add that in.  I'm kind of showing the 'structure'.

You may need to tweak the part starting after the FROM:

FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatient.LastName,TPatient.SocSec
HAVING (((Count(TPatient.FirstName))>1))]. AS [DupSQL];

mx
0
 
thandelAuthor Commented:
I am not familiar, would the Where replace the Having?
0
 
DatabaseMX (Joe Anderson - Microsoft MVP, Access and Data Platform)Commented:
Basically you need to integrate this:

 HAVING Count(*)>1  And [LastName] = [TPatient].[LastName] And [SocSec] = [TPatient].[SocSec]

into the SQL below ...

SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatient.LastName,TPatient.SocSec
HAVING (((Count(TPatient.FirstName))>1))]. AS [DupSQL];

0
 
ldunscombeCommented:
Sorry thandel,  I don't think I fully understood what you were trying to achieve.  I'm still not sure I do, but if you save my solution as a seperate query "DupCnt" then query it with the following you should get the result you are after.

SELECT Sum(DupCnt.Counter) AS SumOfCounter
FROM DupCnt
HAVING (((Sum(DupCnt.Counter))>1));
0
 
DatabaseMX (Joe Anderson - Microsoft MVP, Access and Data Platform)Commented:
"I'm still not sure I do, but if you save my solution as a seperate query "DupCnt" then query it "

Well, he (we) are trying to avoid a separate save query, thus the reason for the sub query.  

mx
0
 
ldunscombeCommented:
OK Sorry,  One last try.

SELECT Sum([Counter]) AS DuplicateRecords
FROM (SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec)
HAVING (((Sum([Counter]))>1));
0
 
DatabaseMX (Joe Anderson - Microsoft MVP, Access and Data Platform)Commented:
lol ... well, that's basically what I  posted.  This issue is ... he has some additional criteria in the original SQL in the Q, which is why I get a different count ...

mx
0
 
harfangCommented:
Hello thandel,

> my test case should have a result of 2 not 6.

Do you meant you have only two identical records (one duplicate), or two groups of identical records (which could represent 6 records)?

mx's solution counts records -- Sum([DupSQL].DupCnt) --, not groups -- Count(*).

(°v°)
0
 
thandelAuthor Commented:
I have two identical records based on firstname, lastname and socsec.  For whatever reason the query is returning 4 more than actaul.  When there are not any duplicates his query returns a value of 4.
0
 
harfangCommented:
You might want to look at these four records... Take the inner query and add the three grouped fields to the select clause:

SELECT FirstName, LastName, SocSec, Count(*) AS Nb
FROM TPatient
GROUP BY FirstName, LastName, SocSec
HAVING Count(*)>1

Verify that it works (it should show you at least one row). Then include it back:

SELECT Count(*) As NbGroups, Sum(Nb) As NbRecs
FROM (
    SELECT FirstName, LastName, SocSec, Count(*) AS Nb
    FROM TPatient
    GROUP BY FirstName, LastName, SocSec
    HAVING Count(*)>1
    ) As Dups

Note that this is exactly mx's query, only with different names.

Alternative: try the "find duplicate query" wizard and compare your results.

(°v°)
0
 
thandelAuthor Commented:
Very stange.... when I run the first query I am getting two completely different people, different first and last name and both don't have a soc (not required to be entered).

When as I the record ID to be displayed on the query then I only have one record coming back from the query.

When I run the dup query wizard, it comes up with nothing as I would expect.
0
 
thandelAuthor Commented:
Further investigation from the two records found are that, one person does in fact have two records and the other person has two records with the same key number.

So the question is why didn't MS query wizard pick it up?

Seems that I have a little database corruption even AFTER running the repair option.
0
 
harfangCommented:
You are right, the duplicate record wizard does not pick up Null fields at all. This is because it's attempting to show all records in an editable query -- using a subquery -- instead of just showing the duplicated values. Due to the implicit link, there is no way to show Null duplicates. Also, the query is meant as a debugging tool when you can't create a unique index on one or several fields, and unique indexes likewise skip over Null fields.

By the way, I didn't recognize your initial query as being just that: a slightly modified result of the duplicate query wizard. Sorry if that created additional confusion.

All in all, this is a good argument in favour of mx's query: it's clearly superior in that regard.

Also, have no fear: this isn't a symptom of database corruption.

(°v°)
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 10
  • 7
  • 5
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now