Solved

SQL query that sums the number of duplicates

Posted on 2014-09-25
11
204 Views
Last Modified: 2014-09-26
I want to return the number of duplicate records per institution. My query looks something like the snippet below. I'm getting an error message that there is a missing right parenthesis, even though the number of left and right parentheses are equal. Even if the parenthesis issue is resolved, I'm not sure that query below will give me the results I want.

SELECT year, institution, SUM(COUNT(student_ID) > 1) AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
Comment
Question by:mariita
  • 5
  • 3
  • 2
  • +1
11 Comments
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344876
>>SUM(COUNT(student_ID) > 1)

Remove the '>1'

SUM(COUNT(student_ID))


Why do you think you need to SUM the COUNT?
0
 

Author Comment

by:mariita
ID: 40344887
My original query had HAVING COUNT(*) > 1 and this did not give an error.  But I don't want to list duplicate records individually; I want to know how many sets of duplicates each institution has.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344891
Try this:
SELECT year, institution, duplicate_record_count
FROM (
   SELECT year, institution, COUNT(student_ID) duplicate_record_count
    FROM enrolment
    GROUP BY year, institution
)
where duplicate_record_count > 1
/
0
 

Author Comment

by:mariita
ID: 40344931
That doesn't return an error, but I don't think that will return the number of duplicate records. To use the example below, I want to show that Institution A has 2 duplicates, Institution B has 1, while Institution C has none.

Duplicates
0
 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40344981
SELECT year, institution, COUNT(student_ID) - 1 AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 250 total points
ID: 40345017
Do you need 'c'?  Since it doesn't have dupes, the having count > 1 won't return it.

If 'C' is optional try:
select year, institution, count(duplicate_record_count) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
having count(*) > 1
)
group by year, institution
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345026
If you need 'C' try this:
select year, institution, sum( case when duplicate_record_count > 1 then 1 else 0 end) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
)
group by year, institution
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345038
This should also work and may be more efficient:
select year, institution, sum(case when student_id=next_student_id then 1 else 0 end)
from (
	select year, institution, student_id, lead(student_id) over(partition by year, institution order by year, institution) next_student_id
	from enrollment
)
group by year, institution
/

Open in new window

0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40345197
| INSTITUTION | ALL_IDS | UNIQUE_IDS | DIFFERENCE |
|-------------|---------|------------|------------|
|           A |       4 |          2 |          2 |
|           B |       2 |          1 |          1 |

Open in new window

Produced by comparing count(*) to count(distinct ...)
Like this:
SELECT
      Institution
    , COUNT(Student_ID)                              ALL_IDS
    , COUNT(DISTINCT Student_ID)                     UNIQUE_IDS
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) DIFFERENCE
FROM enrolment
GROUP BY 
      Institution
HAVING
       COUNT(Student_ID) <> COUNT(DISTINCT Student_ID)
;

Open in new window

Further details:
CREATE TABLE ENROLMENT
	(INSTITUTION VARCHAR2(1), STUDENT_ID INT)
;

INSERT ALL 
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('C', 147)
SELECT * FROM DUAL
;

Open in new window

see: http://sqlfiddle.com/#!4/048b4/3
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 250 total points
ID: 40345199
For this output:
| INSTITUTION | DUPLICATE_RECORD_COUNT |
|-------------|------------------------|
|           A |                      2 |
|           B |                      1 |
|           C |                      0 |

Open in new window

I used this query:
SELECT
      Institution
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) duplicate_record_count
FROM enrolment
GROUP BY 
      Institution
ORDER BY 
      Institution
;

Open in new window

also see: http://sqlfiddle.com/#!4/048b4/5
0
 

Author Comment

by:mariita
ID: 40345924
Thanks for all these responses! It's going to take me a while to go through them and figure out which ones work best.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines

932 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now