Solved

SQL query that sums the number of duplicates

Posted on 2014-09-25
11
211 Views
Last Modified: 2014-09-26
I want to return the number of duplicate records per institution. My query looks something like the snippet below. I'm getting an error message that there is a missing right parenthesis, even though the number of left and right parentheses are equal. Even if the parenthesis issue is resolved, I'm not sure that query below will give me the results I want.

SELECT year, institution, SUM(COUNT(student_ID) > 1) AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
Comment
Question by:mariita
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
  • +1
11 Comments
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344876
>>SUM(COUNT(student_ID) > 1)

Remove the '>1'

SUM(COUNT(student_ID))


Why do you think you need to SUM the COUNT?
0
 

Author Comment

by:mariita
ID: 40344887
My original query had HAVING COUNT(*) > 1 and this did not give an error.  But I don't want to list duplicate records individually; I want to know how many sets of duplicates each institution has.
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40344891
Try this:
SELECT year, institution, duplicate_record_count
FROM (
   SELECT year, institution, COUNT(student_ID) duplicate_record_count
    FROM enrolment
    GROUP BY year, institution
)
where duplicate_record_count > 1
/
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:mariita
ID: 40344931
That doesn't return an error, but I don't think that will return the number of duplicate records. To use the example below, I want to show that Institution A has 2 duplicates, Institution B has 1, while Institution C has none.

Duplicates
0
 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40344981
SELECT year, institution, COUNT(student_ID) - 1 AS duplicate_record_count
FROM enrolment
GROUP BY year, institution
HAVING COUNT(student_ID) > 1
0
 
LVL 77

Accepted Solution

by:
slightwv (䄆 Netminder) earned 250 total points
ID: 40345017
Do you need 'c'?  Since it doesn't have dupes, the having count > 1 won't return it.

If 'C' is optional try:
select year, institution, count(duplicate_record_count) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
having count(*) > 1
)
group by year, institution
/

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345026
If you need 'C' try this:
select year, institution, sum( case when duplicate_record_count > 1 then 1 else 0 end) total_dupes
from
(
select year, institution, student_id,  count(*) duplicate_record_count
from enrollment
group by year, institution, student_id
)
group by year, institution
/

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40345038
This should also work and may be more efficient:
select year, institution, sum(case when student_id=next_student_id then 1 else 0 end)
from (
	select year, institution, student_id, lead(student_id) over(partition by year, institution order by year, institution) next_student_id
	from enrollment
)
group by year, institution
/

Open in new window

0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40345197
| INSTITUTION | ALL_IDS | UNIQUE_IDS | DIFFERENCE |
|-------------|---------|------------|------------|
|           A |       4 |          2 |          2 |
|           B |       2 |          1 |          1 |

Open in new window

Produced by comparing count(*) to count(distinct ...)
Like this:
SELECT
      Institution
    , COUNT(Student_ID)                              ALL_IDS
    , COUNT(DISTINCT Student_ID)                     UNIQUE_IDS
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) DIFFERENCE
FROM enrolment
GROUP BY 
      Institution
HAVING
       COUNT(Student_ID) <> COUNT(DISTINCT Student_ID)
;

Open in new window

Further details:
CREATE TABLE ENROLMENT
	(INSTITUTION VARCHAR2(1), STUDENT_ID INT)
;

INSERT ALL 
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 123)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('A', 456)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('B', 789)
	INTO ENROLMENT (INSTITUTION, STUDENT_ID)
		 VALUES ('C', 147)
SELECT * FROM DUAL
;

Open in new window

see: http://sqlfiddle.com/#!4/048b4/3
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 250 total points
ID: 40345199
For this output:
| INSTITUTION | DUPLICATE_RECORD_COUNT |
|-------------|------------------------|
|           A |                      2 |
|           B |                      1 |
|           C |                      0 |

Open in new window

I used this query:
SELECT
      Institution
    , COUNT(Student_ID) - COUNT(DISTINCT Student_ID) duplicate_record_count
FROM enrolment
GROUP BY 
      Institution
ORDER BY 
      Institution
;

Open in new window

also see: http://sqlfiddle.com/#!4/048b4/5
0
 

Author Comment

by:mariita
ID: 40345924
Thanks for all these responses! It's going to take me a while to go through them and figure out which ones work best.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
Composite queries are used to retrieve the results from joining multiple queries after applying any filters. UNION, INTERSECT, MINUS, and UNION ALL are some of the operators used to get certain desired results.​
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question