thandel
asked on
Count number of duplicate records
I am trying to get a count of the number of duplicate records from a table based on three fields, Fname, Lname, and socsec number. I have this query but it isn't working.
Can you tell me the proper syntax to get a result and possible "clean up" my logic?
SELECT Count(*)
FROM TPatient
WHERE (((TPatient.FirstName) In (SELECT [FirstName] FROM [TPatient] As Tmp GROUP BY [FirstName],[LastName],[So cSec] HAVING Count(*)>1 And [LastName] = [TPatient].[LastName] And [SocSec] = [TPatient].[SocSec])));
Thank you.
Can you tell me the proper syntax to get a result and possible "clean up" my logic?
SELECT Count(*)
FROM TPatient
WHERE (((TPatient.FirstName) In (SELECT [FirstName] FROM [TPatient] As Tmp GROUP BY [FirstName],[LastName],[So
Thank you.
Try this SQL:
SELECT Sum([%$##@_Alias].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY [FirstName],[LastName],[So cSec]
HAVING (((Count(TPatient.FirstNam e))>1))]. AS [%$##@_Alias];
SELECT Sum([%$##@_Alias].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY [FirstName],[LastName],[So
HAVING (((Count(TPatient.FirstNam
ASKER
Hmmm I gave you the wrong field names but when I use the code I'm getting names instead of a value. Here is what I tried:
SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS Counter
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec;
SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS Counter
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec;
ASKER
DatabaseMX.... what is [%$##@_Alias]?
I tried that solution and got a error for invavlid braket name.
I tried that solution and got a error for invavlid braket name.
Sorry, try this(I'll explain later) ...
SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatien t.LastName ,TPatient. SocSec
HAVING (((Count(TPatient.FirstNam e))>1))]. AS [DupSQL];
SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatien
HAVING (((Count(TPatient.FirstNam
ASKER
That solution returned a value of 6, however there are only two records in the DB that are duplicates. So my test case should have a result of 2 not 6.
thandel,
Do you not get a "count" field in addition to the names with my solution. ???
Do you not get a "count" field in addition to the names with my solution. ???
ASKER
I do when used as a query, however I am trying to simply use this as data in list box only as the data sort. Sorry I didn't think it made a difference where I used a query.
" So my test case should have a result of 2 not 6."
Using which solution ?
The SQL I posted is basically Summing the SQL of the query that does the dup counting ... and gives the total.
mx
Using which solution ?
The SQL I posted is basically Summing the SQL of the query that does the dup counting ... and gives the total.
mx
You can, Just add the order by clause as below.
SELECT TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC, Count([FNAME] & [LNAME] & [SOCSEC]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC
ORDER BY Count([FNAME] & [LNAME] & [SOCSEC]) DESC;
SELECT TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC, Count([FNAME] & [LNAME] & [SOCSEC]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC
ORDER BY Count([FNAME] & [LNAME] & [SOCSEC]) DESC;
ASKER
DatabaseMX: Using this solution:
SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatien t.LastName ,TPatient. SocSec
HAVING (((Count(TPatient.FirstNam e))>1))]. AS [DupSQL];
SELECT Sum([DupSQL].DupCnt) AS SumOfDupCnt
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatien
HAVING (((Count(TPatient.FirstNam
ASKER
ldunscombe, thank you but I don't need a single value in an order so I would rather not use the overhead to sort a single value.
OK ... well, I don't have a WHERE clause ... you probably need to add that in. I'm kind of showing the 'structure'.
You may need to tweak the part starting after the FROM:
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatien t.LastName ,TPatient. SocSec
HAVING (((Count(TPatient.FirstNam e))>1))]. AS [DupSQL];
mx
You may need to tweak the part starting after the FROM:
FROM [SELECT Count(TPatient.FirstName) AS DupCnt
FROM TPatient
GROUP BY TPatient.FirstName,TPatien
HAVING (((Count(TPatient.FirstNam
mx
ASKER
I am not familiar, would the Where replace the Having?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Sorry thandel, I don't think I fully understood what you were trying to achieve. I'm still not sure I do, but if you save my solution as a seperate query "DupCnt" then query it with the following you should get the result you are after.
SELECT Sum(DupCnt.Counter) AS SumOfCounter
FROM DupCnt
HAVING (((Sum(DupCnt.Counter))>1) );
SELECT Sum(DupCnt.Counter) AS SumOfCounter
FROM DupCnt
HAVING (((Sum(DupCnt.Counter))>1)
"I'm still not sure I do, but if you save my solution as a seperate query "DupCnt" then query it "
Well, he (we) are trying to avoid a separate save query, thus the reason for the sub query.
mx
Well, he (we) are trying to avoid a separate save query, thus the reason for the sub query.
mx
OK Sorry, One last try.
SELECT Sum([Counter]) AS DuplicateRecords
FROM (SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec)
HAVING (((Sum([Counter]))>1));
SELECT Sum([Counter]) AS DuplicateRecords
FROM (SELECT TPatient.FirstName, TPatient.LastName, TPatient.SocSec, Count([FirstName] & [LastName] & [SocSec]) AS [Counter]
FROM TPatient
GROUP BY TPatient.FirstName, TPatient.LastName, TPatient.SocSec)
HAVING (((Sum([Counter]))>1));
lol ... well, that's basically what I posted. This issue is ... he has some additional criteria in the original SQL in the Q, which is why I get a different count ...
mx
mx
Hello thandel,
> my test case should have a result of 2 not 6.
Do you meant you have only two identical records (one duplicate), or two groups of identical records (which could represent 6 records)?
mx's solution counts records -- Sum([DupSQL].DupCnt) --, not groups -- Count(*).
(°v°)
> my test case should have a result of 2 not 6.
Do you meant you have only two identical records (one duplicate), or two groups of identical records (which could represent 6 records)?
mx's solution counts records -- Sum([DupSQL].DupCnt) --, not groups -- Count(*).
(°v°)
ASKER
I have two identical records based on firstname, lastname and socsec. For whatever reason the query is returning 4 more than actaul. When there are not any duplicates his query returns a value of 4.
You might want to look at these four records... Take the inner query and add the three grouped fields to the select clause:
SELECT FirstName, LastName, SocSec, Count(*) AS Nb
FROM TPatient
GROUP BY FirstName, LastName, SocSec
HAVING Count(*)>1
Verify that it works (it should show you at least one row). Then include it back:
SELECT Count(*) As NbGroups, Sum(Nb) As NbRecs
FROM (
SELECT FirstName, LastName, SocSec, Count(*) AS Nb
FROM TPatient
GROUP BY FirstName, LastName, SocSec
HAVING Count(*)>1
) As Dups
Note that this is exactly mx's query, only with different names.
Alternative: try the "find duplicate query" wizard and compare your results.
(°v°)
SELECT FirstName, LastName, SocSec, Count(*) AS Nb
FROM TPatient
GROUP BY FirstName, LastName, SocSec
HAVING Count(*)>1
Verify that it works (it should show you at least one row). Then include it back:
SELECT Count(*) As NbGroups, Sum(Nb) As NbRecs
FROM (
SELECT FirstName, LastName, SocSec, Count(*) AS Nb
FROM TPatient
GROUP BY FirstName, LastName, SocSec
HAVING Count(*)>1
) As Dups
Note that this is exactly mx's query, only with different names.
Alternative: try the "find duplicate query" wizard and compare your results.
(°v°)
ASKER
Very stange.... when I run the first query I am getting two completely different people, different first and last name and both don't have a soc (not required to be entered).
When as I the record ID to be displayed on the query then I only have one record coming back from the query.
When I run the dup query wizard, it comes up with nothing as I would expect.
When as I the record ID to be displayed on the query then I only have one record coming back from the query.
When I run the dup query wizard, it comes up with nothing as I would expect.
ASKER
Further investigation from the two records found are that, one person does in fact have two records and the other person has two records with the same key number.
So the question is why didn't MS query wizard pick it up?
Seems that I have a little database corruption even AFTER running the repair option.
So the question is why didn't MS query wizard pick it up?
Seems that I have a little database corruption even AFTER running the repair option.
You are right, the duplicate record wizard does not pick up Null fields at all. This is because it's attempting to show all records in an editable query -- using a subquery -- instead of just showing the duplicated values. Due to the implicit link, there is no way to show Null duplicates. Also, the query is meant as a debugging tool when you can't create a unique index on one or several fields, and unique indexes likewise skip over Null fields.
By the way, I didn't recognize your initial query as being just that: a slightly modified result of the duplicate query wizard. Sorry if that created additional confusion.
All in all, this is a good argument in favour of mx's query: it's clearly superior in that regard.
Also, have no fear: this isn't a symptom of database corruption.
(°v°)
By the way, I didn't recognize your initial query as being just that: a slightly modified result of the duplicate query wizard. Sorry if that created additional confusion.
All in all, this is a good argument in favour of mx's query: it's clearly superior in that regard.
Also, have no fear: this isn't a symptom of database corruption.
(°v°)
FROM TPatient
GROUP BY TPatient.FNAME, TPatient.LNAME, TPatient.SOCSEC;
Leigh