Solved

SQL query that sums the number of duplicates

Posted on 2014-09-25
11
213 Views
Last Modified: 2014-09-26
I want to return the number of duplicate records per institution. My query looks something like the snippet below. I'm getting an error message that there is a missing right parenthesis, even though the number of left and right parentheses are equal. Even if the parenthesis issue is resolved, I'm not sure that query below will give me the results I want.

SELECT year, institution, SUM(COUNT(student_ID) > 1) AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
Comment
Question by:mariita
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
  • +1
11 Comments
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344876
>>SUM(COUNT(student_ID) > 1)

Remove the '>1'

SUM(COUNT(student_ID))


Why do you think you need to SUM the COUNT?
0
 

Author Comment

by:mariita
ID: 40344887
My original query had HAVING COUNT(*) > 1 and this did not give an error.  But I don't want to list duplicate records individually; I want to know how many sets of duplicates each institution has.
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344891
Try this:
SELECT year, institution, duplicate_record_count
FROM (
   SELECT year, institution, COUNT(student_ID) duplicate_record_count
    FROM enrolment
    GROUP BY year, institution
)
where duplicate_record_count > 1
/
0
MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

 

Author Comment

by:mariita
ID: 40344931
That doesn't return an error, but I don't think that will return the number of duplicate records. To use the example below, I want to show that Institution A has 2 duplicates, Institution B has 1, while Institution C has none.

Duplicates
0
 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40344981
SELECT year, institution, COUNT(student_ID) - 1 AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
 
LVL 77

Accepted Solution

by:
slightwv (䄆 Netminder) earned 250 total points
ID: 40345017
Do you need 'c'?  Since it doesn't have dupes, the having count > 1 won't return it.

If 'C' is optional try:
select year, institution, count(duplicate_record_count) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
having count(*) > 1
)
group by year, institution
/

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345026
If you need 'C' try this:
select year, institution, sum( case when duplicate_record_count > 1 then 1 else 0 end) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
)
group by year, institution
/

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345038
This should also work and may be more efficient:
select year, institution, sum(case when student_id=next_student_id then 1 else 0 end)
from (
	select year, institution, student_id, lead(student_id) over(partition by year, institution order by year, institution) next_student_id
	from enrollment
)
group by year, institution
/

Open in new window

0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40345197
| INSTITUTION | ALL_IDS | UNIQUE_IDS | DIFFERENCE |
|-------------|---------|------------|------------|
|           A |       4 |          2 |          2 |
|           B |       2 |          1 |          1 |

Open in new window

Produced by comparing count(*) to count(distinct ...)
Like this:
SELECT
      Institution
    , COUNT(Student_ID)                              ALL_IDS
    , COUNT(DISTINCT Student_ID)                     UNIQUE_IDS
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) DIFFERENCE
FROM enrolment
GROUP BY 
      Institution
HAVING
       COUNT(Student_ID) <> COUNT(DISTINCT Student_ID)
;

Open in new window

Further details:
CREATE TABLE ENROLMENT
	(INSTITUTION VARCHAR2(1), STUDENT_ID INT)
;

INSERT ALL 
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('C', 147)
SELECT * FROM DUAL
;

Open in new window

see: http://sqlfiddle.com/#!4/048b4/3
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 250 total points
ID: 40345199
For this output:
| INSTITUTION | DUPLICATE_RECORD_COUNT |
|-------------|------------------------|
|           A |                      2 |
|           B |                      1 |
|           C |                      0 |

Open in new window

I used this query:
SELECT
      Institution
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) duplicate_record_count
FROM enrolment
GROUP BY 
      Institution
ORDER BY 
      Institution
;

Open in new window

also see: http://sqlfiddle.com/#!4/048b4/5
0
 

Author Comment

by:mariita
ID: 40345924
Thanks for all these responses! It's going to take me a while to go through them and figure out which ones work best.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common …
Confronted with some SQL you don't know can be a daunting task. It can be even more daunting if that SQL carries some of the old secret codes used in the Ye Olde query syntax, such as: (+)     as used in Oracle;     *=     =*    as used in Sybase …
This video explains at a high level about the four available data types in Oracle and how dates can be manipulated by the user to get data into and out of the database.
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question