Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

SQL query that sums the number of duplicates

Posted on 2014-09-25
11
Medium Priority
?
215 Views
Last Modified: 2014-09-26
I want to return the number of duplicate records per institution. My query looks something like the snippet below. I'm getting an error message that there is a missing right parenthesis, even though the number of left and right parentheses are equal. Even if the parenthesis issue is resolved, I'm not sure that query below will give me the results I want.

SELECT year, institution, SUM(COUNT(student_ID) > 1) AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
Comment
Question by:mariita
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
  • +1
11 Comments
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344876
>>SUM(COUNT(student_ID) > 1)

Remove the '>1'

SUM(COUNT(student_ID))


Why do you think you need to SUM the COUNT?
0
 

Author Comment

by:mariita
ID: 40344887
My original query had HAVING COUNT(*) > 1 and this did not give an error.  But I don't want to list duplicate records individually; I want to know how many sets of duplicates each institution has.
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344891
Try this:
SELECT year, institution, duplicate_record_count
FROM (
   SELECT year, institution, COUNT(student_ID) duplicate_record_count
    FROM enrolment
    GROUP BY year, institution
)
where duplicate_record_count > 1
/
0
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

 

Author Comment

by:mariita
ID: 40344931
That doesn't return an error, but I don't think that will return the number of duplicate records. To use the example below, I want to show that Institution A has 2 duplicates, Institution B has 1, while Institution C has none.

Duplicates
0
 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40344981
SELECT year, institution, COUNT(student_ID) - 1 AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
 
LVL 77

Accepted Solution

by:
slightwv (䄆 Netminder) earned 1000 total points
ID: 40345017
Do you need 'c'?  Since it doesn't have dupes, the having count > 1 won't return it.

If 'C' is optional try:
select year, institution, count(duplicate_record_count) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
having count(*) > 1
)
group by year, institution
/

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345026
If you need 'C' try this:
select year, institution, sum( case when duplicate_record_count > 1 then 1 else 0 end) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
)
group by year, institution
/

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345038
This should also work and may be more efficient:
select year, institution, sum(case when student_id=next_student_id then 1 else 0 end)
from (
	select year, institution, student_id, lead(student_id) over(partition by year, institution order by year, institution) next_student_id
	from enrollment
)
group by year, institution
/

Open in new window

0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40345197
| INSTITUTION | ALL_IDS | UNIQUE_IDS | DIFFERENCE |
|-------------|---------|------------|------------|
|           A |       4 |          2 |          2 |
|           B |       2 |          1 |          1 |

Open in new window

Produced by comparing count(*) to count(distinct ...)
Like this:
SELECT
      Institution
    , COUNT(Student_ID)                              ALL_IDS
    , COUNT(DISTINCT Student_ID)                     UNIQUE_IDS
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) DIFFERENCE
FROM enrolment
GROUP BY 
      Institution
HAVING
       COUNT(Student_ID) <> COUNT(DISTINCT Student_ID)
;

Open in new window

Further details:
CREATE TABLE ENROLMENT
	(INSTITUTION VARCHAR2(1), STUDENT_ID INT)
;

INSERT ALL 
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('C', 147)
SELECT * FROM DUAL
;

Open in new window

see: http://sqlfiddle.com/#!4/048b4/3
0
 
LVL 49

Assisted Solution

by:PortletPaul
PortletPaul earned 1000 total points
ID: 40345199
For this output:
| INSTITUTION | DUPLICATE_RECORD_COUNT |
|-------------|------------------------|
|           A |                      2 |
|           B |                      1 |
|           C |                      0 |

Open in new window

I used this query:
SELECT
      Institution
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) duplicate_record_count
FROM enrolment
GROUP BY 
      Institution
ORDER BY 
      Institution
;

Open in new window

also see: http://sqlfiddle.com/#!4/048b4/5
0
 

Author Comment

by:mariita
ID: 40345924
Thanks for all these responses! It's going to take me a while to go through them and figure out which ones work best.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
From implementing a password expiration date, to datatype conversions and file export options, these are some useful settings I've found in Jasper Server.
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question