Solved

Select distinct statement SQL

Posted on 2010-09-01
6
372 Views
Last Modified: 2012-05-10
This SQL statement returns multiple rows that are duplicate, where I want it to only list values once:

"SELECT * FROM Instrument Left Outer Join Schedule On Instrument.InstrumentID = Schedule.InstrumentID Where Schedule.InstrumentID IN (SELECT MAX(Schedule.InstrumentID) FROM Schedule WHERE Instrument.InstrumentID = Schedule.InstrumentID Group by Schedule.InstrumentID )"

The 2 tables are:
Instrument:
InstrumentID    InstrumentName
1                         Cello
2                         Violin
3                         Piano
4                         Guitar


Schedule
ScheduleID  InstrumentID
1                         2
2                         2
3                         1
4                         2
5                         3
6                         3

I want the SQL to return Instrument name from Instrument table, only when it occurs in the Schedule table. And if it occurs multiple times in Schedule table, only return it once, and not more than once.

So cello, violin and piano will be selected only once, and guitar will not be selected at all. Currently, the above SQL selects the instrument more than once, for example it selects violin 3 times, cello once, and piano 2 times.

Thank you
0
Comment
Question by:mlg101
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 40

Accepted Solution

by:
Richard Quadling earned 250 total points
ID: 33582726
SELECT DISTINCT TOP 100 PERCENT
 Instrument.InstrumentName
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID

The inner join will only include rows that exist in both tables.

The DISTINCT selects distinct rows, in this case Instrument Names.
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33582734
A variation ...

SELECT TOP 100 PERCENT
 Instrument.InstrumentName,
 COUNT(*) AS Schedules
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID
GROUP BY
 Instrument.InstrumentName
ORDER BY
 Instrument.InstrumentName

Still only showing 1 row per instrument, but also how many times it is scheduled.
0
 
LVL 3

Assisted Solution

by:Gjslick
Gjslick earned 250 total points
ID: 33582871
A much simpler way would be this:

SELECT
      instrumentName
FROM
      Instrument
WHERE
      instrumentID IN (
            SELECT instrumentID FROM Schedule
      )
0
Free eBook: Backup on AWS

Everything you need to know about backup and disaster recovery with AWS, for FREE!

 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33582884
The subselect will work just fine.

I'd be interested in seeing the differences in performance / execution plan.

I honestly don't know which is better, but I've been led to believe that sub-queries are sub-optimal. But probably not in all cases.

And it may be dependent upon pre-existing statistics.
0
 
LVL 3

Expert Comment

by:Gjslick
ID: 33583094
In this case, the subquery performs much better than the inner join.  Subqueries are usually only "bad for performance" if they become a "correlated subquery," because then the subquery must be executed once for each row of data that is returned in the outer query.  So for example, if the subquery was in the SELECT clause for a computed column, and the outer query returned 5 rows, then the subquery would have to be executed 5 times.  This is not the case in this situation.

In this case with the subquery in the WHERE clause, it is only executed once, and a list of values are returned for use with the outer query.  This is much less costly than the inner join.

However, you are correct in that this can depend on other factors, such as table size, indexes, what the optimizer can do to make the query perform better, etc.  Just with some simple testing with these two tables on sql2005: as the tables are right now with 4/6 rows respectively, the query with the subquery is about 1/3rd of the cost of the inner join queries.  I then tried adding 10,000 rows to each table, and the subquery query became only 50% of the cost of the inner join queries.  I then tried adding 100,000 rows to each table, and the subquery query actually became 4% higher in cost.
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33584091
Thanks for the analysis, Gjslick.
0

Featured Post

Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In SQL Server, when rows are selected from a table, does it retrieve data in the order in which it is inserted?  Many believe this is the case. Let us try to examine for ourselves with an example. To get started, use the following script, wh…
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Are you ready to implement Active Directory best practices without reading 300+ pages? You're in luck. In this webinar hosted by Skyport Systems, you gain insight into Microsoft's latest comprehensive guide, with tips on the best and easiest way…

736 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question