Link to home
Start Free TrialLog in
Avatar of thandel
thandel

asked on

Count number of duplicate records

I am trying to get a count of the number of duplicate records from a table based on three fields, Fname, Lname, and socsec number.  I have this query but it isn't working.  

Can you tell me the proper syntax to get a result and possible "clean up" my logic?

SELECT Count(*)
FROM TPatient
WHERE (((TPatient.FirstName) In (SELECT [FirstName] FROM [TPatient] As Tmp GROUP BY [FirstName],[LastName],[SocSec] HAVING Count(*)>1  And [LastName] = [TPatient].[LastName] And [SocSec] = [TPatient].[SocSec])));


Thank you.
Avatar of ldunscombe
ldunscombe
Flag of Australia image

SELECT TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC, Count([FNAME] & [LNAME] & [SOCSEC]) AS Counter
FROM TPatient
GROUP BY TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC;

Leigh
Avatar of DatabaseMX (Joe Anderson - Former Microsoft Access MVP)
Try this SQL:

SELECT Sum([%$##@_Alias].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY [FirstName],[LastName],[SocSec]
HAVING (((Count(TPatient.FirstName))>1))]. AS [%$##@_Alias];
Avatar of thandel
thandel

ASKER

Hmmm I gave you the wrong field names but when I use the code I'm getting names instead of a value.  Here is what I tried:

SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS Counter
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec;
Avatar of thandel

ASKER

DatabaseMX.... what is [%$##@_Alias]?

I tried that solution and got a error for invavlid braket name.
Sorry, try this(I'll explain later) ...


SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatient.LastName,TPatient.SocSec
HAVING (((Count(TPatient.FirstName))>1))]. AS [DupSQL];
Avatar of thandel

ASKER

That solution returned a value of 6, however there are only two records in the DB that are duplicates.  So my test case should have a result of 2 not 6.
thandel,

Do you not get a "count" field in addition to the names with my solution. ???
Avatar of thandel

ASKER

I do when used as a query, however I am trying to simply use this as data in list box only as the data sort.  Sorry I didn't think it made a difference where I used a query.
" So my test case should have a result of 2 not 6."

Using which solution ?

The SQL I posted is basically Summing the SQL of the query that does the dup counting ... and gives the total.

mx
You can, Just add the order by clause as below.
SELECT TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC, Count([FNAME] & [LNAME] & [SOCSEC]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC
ORDER BY Count([FNAME] & [LNAME] & [SOCSEC]) DESC;
Avatar of thandel

ASKER

DatabaseMX: Using this solution:

SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatient.LastName,TPatient.SocSec
HAVING (((Count(TPatient.FirstName))>1))]. AS [DupSQL];
Avatar of thandel

ASKER

ldunscombe, thank you but I don't need a single value in an order so I would rather not use the overhead to sort a single value.
OK ... well, I don't have a WHERE clause ... you probably need to add that in.  I'm kind of showing the 'structure'.

You may need to tweak the part starting after the FROM:

FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatient.LastName,TPatient.SocSec
HAVING (((Count(TPatient.FirstName))>1))]. AS [DupSQL];

mx
Avatar of thandel

ASKER

I am not familiar, would the Where replace the Having?
ASKER CERTIFIED SOLUTION
Avatar of DatabaseMX (Joe Anderson - Former Microsoft Access MVP)
DatabaseMX (Joe Anderson - Former Microsoft Access MVP)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry thandel,  I don't think I fully understood what you were trying to achieve.  I'm still not sure I do, but if you save my solution as a seperate query "DupCnt" then query it with the following you should get the result you are after.

SELECT Sum(DupCnt.Counter) AS SumOfCounter
FROM DupCnt
HAVING (((Sum(DupCnt.Counter))>1));
"I'm still not sure I do, but if you save my solution as a seperate query "DupCnt" then query it "

Well, he (we) are trying to avoid a separate save query, thus the reason for the sub query.  

mx
OK Sorry,  One last try.

SELECT Sum([Counter]) AS DuplicateRecords
FROM (SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec)
HAVING (((Sum([Counter]))>1));
lol ... well, that's basically what I  posted.  This issue is ... he has some additional criteria in the original SQL in the Q, which is why I get a different count ...

mx
Hello thandel,

> my test case should have a result of 2 not 6.

Do you meant you have only two identical records (one duplicate), or two groups of identical records (which could represent 6 records)?

mx's solution counts records -- Sum([DupSQL].DupCnt) --, not groups -- Count(*).

(°v°)
Avatar of thandel

ASKER

I have two identical records based on firstname, lastname and socsec.  For whatever reason the query is returning 4 more than actaul.  When there are not any duplicates his query returns a value of 4.
You might want to look at these four records... Take the inner query and add the three grouped fields to the select clause:

SELECT FirstName, LastName, SocSec, Count(*) AS Nb
FROM TPatient
GROUP BY FirstName, LastName, SocSec
HAVING Count(*)>1

Verify that it works (it should show you at least one row). Then include it back:

SELECT Count(*) As NbGroups, Sum(Nb) As NbRecs
FROM (
    SELECT FirstName, LastName, SocSec, Count(*) AS Nb
    FROM TPatient
    GROUP BY FirstName, LastName, SocSec
    HAVING Count(*)>1
    ) As Dups

Note that this is exactly mx's query, only with different names.

Alternative: try the "find duplicate query" wizard and compare your results.

(°v°)
Avatar of thandel

ASKER

Very stange.... when I run the first query I am getting two completely different people, different first and last name and both don't have a soc (not required to be entered).

When as I the record ID to be displayed on the query then I only have one record coming back from the query.

When I run the dup query wizard, it comes up with nothing as I would expect.
Avatar of thandel

ASKER

Further investigation from the two records found are that, one person does in fact have two records and the other person has two records with the same key number.

So the question is why didn't MS query wizard pick it up?

Seems that I have a little database corruption even AFTER running the repair option.
You are right, the duplicate record wizard does not pick up Null fields at all. This is because it's attempting to show all records in an editable query -- using a subquery -- instead of just showing the duplicated values. Due to the implicit link, there is no way to show Null duplicates. Also, the query is meant as a debugging tool when you can't create a unique index on one or several fields, and unique indexes likewise skip over Null fields.

By the way, I didn't recognize your initial query as being just that: a slightly modified result of the duplicate query wizard. Sorry if that created additional confusion.

All in all, this is a good argument in favour of mx's query: it's clearly superior in that regard.

Also, have no fear: this isn't a symptom of database corruption.

(°v°)