Solved

Select distinct statement SQL

Posted on 2010-09-01
6
364 Views
Last Modified: 2012-05-10
This SQL statement returns multiple rows that are duplicate, where I want it to only list values once:

"SELECT * FROM Instrument Left Outer Join Schedule On Instrument.InstrumentID = Schedule.InstrumentID Where Schedule.InstrumentID IN (SELECT MAX(Schedule.InstrumentID) FROM Schedule WHERE Instrument.InstrumentID = Schedule.InstrumentID Group by Schedule.InstrumentID )"

The 2 tables are:
Instrument:
InstrumentID    InstrumentName
1                         Cello
2                         Violin
3                         Piano
4                         Guitar


Schedule
ScheduleID  InstrumentID
1                         2
2                         2
3                         1
4                         2
5                         3
6                         3

I want the SQL to return Instrument name from Instrument table, only when it occurs in the Schedule table. And if it occurs multiple times in Schedule table, only return it once, and not more than once.

So cello, violin and piano will be selected only once, and guitar will not be selected at all. Currently, the above SQL selects the instrument more than once, for example it selects violin 3 times, cello once, and piano 2 times.

Thank you
0
Comment
Question by:mlg101
  • 4
  • 2
6 Comments
 
LVL 40

Accepted Solution

by:
RQuadling earned 250 total points
Comment Utility
SELECT DISTINCT TOP 100 PERCENT
 Instrument.InstrumentName
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID

The inner join will only include rows that exist in both tables.

The DISTINCT selects distinct rows, in this case Instrument Names.
0
 
LVL 40

Expert Comment

by:RQuadling
Comment Utility
A variation ...

SELECT TOP 100 PERCENT
 Instrument.InstrumentName,
 COUNT(*) AS Schedules
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID
GROUP BY
 Instrument.InstrumentName
ORDER BY
 Instrument.InstrumentName

Still only showing 1 row per instrument, but also how many times it is scheduled.
0
 
LVL 3

Assisted Solution

by:Gjslick
Gjslick earned 250 total points
Comment Utility
A much simpler way would be this:

SELECT
      instrumentName
FROM
      Instrument
WHERE
      instrumentID IN (
            SELECT instrumentID FROM Schedule
      )
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 40

Expert Comment

by:RQuadling
Comment Utility
The subselect will work just fine.

I'd be interested in seeing the differences in performance / execution plan.

I honestly don't know which is better, but I've been led to believe that sub-queries are sub-optimal. But probably not in all cases.

And it may be dependent upon pre-existing statistics.
0
 
LVL 3

Expert Comment

by:Gjslick
Comment Utility
In this case, the subquery performs much better than the inner join.  Subqueries are usually only "bad for performance" if they become a "correlated subquery," because then the subquery must be executed once for each row of data that is returned in the outer query.  So for example, if the subquery was in the SELECT clause for a computed column, and the outer query returned 5 rows, then the subquery would have to be executed 5 times.  This is not the case in this situation.

In this case with the subquery in the WHERE clause, it is only executed once, and a list of values are returned for use with the outer query.  This is much less costly than the inner join.

However, you are correct in that this can depend on other factors, such as table size, indexes, what the optimizer can do to make the query perform better, etc.  Just with some simple testing with these two tables on sql2005: as the tables are right now with 4/6 rows respectively, the query with the subquery is about 1/3rd of the cost of the inner join queries.  I then tried adding 10,000 rows to each table, and the subquery query became only 50% of the cost of the inner join queries.  I then tried adding 100,000 rows to each table, and the subquery query actually became 4% higher in cost.
0
 
LVL 40

Expert Comment

by:RQuadling
Comment Utility
Thanks for the analysis, Gjslick.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Data architecture is an important aspect in Software as a Service (SaaS) delivery model. This article is a study on the database of a single-tenant application that could be extended to support multiple tenants. The application is web-based develope…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now