Solved

Select distinct statement SQL

Posted on 2010-09-01
6
369 Views
Last Modified: 2012-05-10
This SQL statement returns multiple rows that are duplicate, where I want it to only list values once:

"SELECT * FROM Instrument Left Outer Join Schedule On Instrument.InstrumentID = Schedule.InstrumentID Where Schedule.InstrumentID IN (SELECT MAX(Schedule.InstrumentID) FROM Schedule WHERE Instrument.InstrumentID = Schedule.InstrumentID Group by Schedule.InstrumentID )"

The 2 tables are:
Instrument:
InstrumentID    InstrumentName
1                         Cello
2                         Violin
3                         Piano
4                         Guitar


Schedule
ScheduleID  InstrumentID
1                         2
2                         2
3                         1
4                         2
5                         3
6                         3

I want the SQL to return Instrument name from Instrument table, only when it occurs in the Schedule table. And if it occurs multiple times in Schedule table, only return it once, and not more than once.

So cello, violin and piano will be selected only once, and guitar will not be selected at all. Currently, the above SQL selects the instrument more than once, for example it selects violin 3 times, cello once, and piano 2 times.

Thank you
0
Comment
Question by:mlg101
  • 4
  • 2
6 Comments
 
LVL 40

Accepted Solution

by:
Richard Quadling earned 250 total points
ID: 33582726
SELECT DISTINCT TOP 100 PERCENT
 Instrument.InstrumentName
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID

The inner join will only include rows that exist in both tables.

The DISTINCT selects distinct rows, in this case Instrument Names.
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33582734
A variation ...

SELECT TOP 100 PERCENT
 Instrument.InstrumentName,
 COUNT(*) AS Schedules
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID
GROUP BY
 Instrument.InstrumentName
ORDER BY
 Instrument.InstrumentName

Still only showing 1 row per instrument, but also how many times it is scheduled.
0
 
LVL 3

Assisted Solution

by:Gjslick
Gjslick earned 250 total points
ID: 33582871
A much simpler way would be this:

SELECT
      instrumentName
FROM
      Instrument
WHERE
      instrumentID IN (
            SELECT instrumentID FROM Schedule
      )
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33582884
The subselect will work just fine.

I'd be interested in seeing the differences in performance / execution plan.

I honestly don't know which is better, but I've been led to believe that sub-queries are sub-optimal. But probably not in all cases.

And it may be dependent upon pre-existing statistics.
0
 
LVL 3

Expert Comment

by:Gjslick
ID: 33583094
In this case, the subquery performs much better than the inner join.  Subqueries are usually only "bad for performance" if they become a "correlated subquery," because then the subquery must be executed once for each row of data that is returned in the outer query.  So for example, if the subquery was in the SELECT clause for a computed column, and the outer query returned 5 rows, then the subquery would have to be executed 5 times.  This is not the case in this situation.

In this case with the subquery in the WHERE clause, it is only executed once, and a list of values are returned for use with the outer query.  This is much less costly than the inner join.

However, you are correct in that this can depend on other factors, such as table size, indexes, what the optimizer can do to make the query perform better, etc.  Just with some simple testing with these two tables on sql2005: as the tables are right now with 4/6 rows respectively, the query with the subquery is about 1/3rd of the cost of the inner join queries.  I then tried adding 10,000 rows to each table, and the subquery query became only 50% of the cost of the inner join queries.  I then tried adding 100,000 rows to each table, and the subquery query actually became 4% higher in cost.
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33584091
Thanks for the analysis, Gjslick.
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

by Mark Wills Attending one of Rob Farley's seminars the other day, I heard the phrase "The Accidental DBA" and fell in love with it. It got me thinking about the plight of the newcomer to SQL Server...  So if you are the accidental DBA, or, simp…
INTRODUCTION: While tying your database objects into builds and your enterprise source control system takes a third-party product (like Visual Studio Database Edition or Red-Gate's SQL Source Control), you can achieve some protection using a sing…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question