Solved

Select distinct statement SQL

Posted on 2010-09-01
6
365 Views
Last Modified: 2012-05-10
This SQL statement returns multiple rows that are duplicate, where I want it to only list values once:

"SELECT * FROM Instrument Left Outer Join Schedule On Instrument.InstrumentID = Schedule.InstrumentID Where Schedule.InstrumentID IN (SELECT MAX(Schedule.InstrumentID) FROM Schedule WHERE Instrument.InstrumentID = Schedule.InstrumentID Group by Schedule.InstrumentID )"

The 2 tables are:
Instrument:
InstrumentID    InstrumentName
1                         Cello
2                         Violin
3                         Piano
4                         Guitar


Schedule
ScheduleID  InstrumentID
1                         2
2                         2
3                         1
4                         2
5                         3
6                         3

I want the SQL to return Instrument name from Instrument table, only when it occurs in the Schedule table. And if it occurs multiple times in Schedule table, only return it once, and not more than once.

So cello, violin and piano will be selected only once, and guitar will not be selected at all. Currently, the above SQL selects the instrument more than once, for example it selects violin 3 times, cello once, and piano 2 times.

Thank you
0
Comment
Question by:mlg101
  • 4
  • 2
6 Comments
 
LVL 40

Accepted Solution

by:
RQuadling earned 250 total points
ID: 33582726
SELECT DISTINCT TOP 100 PERCENT
 Instrument.InstrumentName
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID

The inner join will only include rows that exist in both tables.

The DISTINCT selects distinct rows, in this case Instrument Names.
0
 
LVL 40

Expert Comment

by:RQuadling
ID: 33582734
A variation ...

SELECT TOP 100 PERCENT
 Instrument.InstrumentName,
 COUNT(*) AS Schedules
FROM
 Instrument
 INNER JOIN
 Schedule ON
  Instrument.InstrumentID = Schedule.InstrumentID
GROUP BY
 Instrument.InstrumentName
ORDER BY
 Instrument.InstrumentName

Still only showing 1 row per instrument, but also how many times it is scheduled.
0
 
LVL 3

Assisted Solution

by:Gjslick
Gjslick earned 250 total points
ID: 33582871
A much simpler way would be this:

SELECT
      instrumentName
FROM
      Instrument
WHERE
      instrumentID IN (
            SELECT instrumentID FROM Schedule
      )
0
VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

 
LVL 40

Expert Comment

by:RQuadling
ID: 33582884
The subselect will work just fine.

I'd be interested in seeing the differences in performance / execution plan.

I honestly don't know which is better, but I've been led to believe that sub-queries are sub-optimal. But probably not in all cases.

And it may be dependent upon pre-existing statistics.
0
 
LVL 3

Expert Comment

by:Gjslick
ID: 33583094
In this case, the subquery performs much better than the inner join.  Subqueries are usually only "bad for performance" if they become a "correlated subquery," because then the subquery must be executed once for each row of data that is returned in the outer query.  So for example, if the subquery was in the SELECT clause for a computed column, and the outer query returned 5 rows, then the subquery would have to be executed 5 times.  This is not the case in this situation.

In this case with the subquery in the WHERE clause, it is only executed once, and a list of values are returned for use with the outer query.  This is much less costly than the inner join.

However, you are correct in that this can depend on other factors, such as table size, indexes, what the optimizer can do to make the query perform better, etc.  Just with some simple testing with these two tables on sql2005: as the tables are right now with 4/6 rows respectively, the query with the subquery is about 1/3rd of the cost of the inner join queries.  I then tried adding 10,000 rows to each table, and the subquery query became only 50% of the cost of the inner join queries.  I then tried adding 100,000 rows to each table, and the subquery query actually became 4% higher in cost.
0
 
LVL 40

Expert Comment

by:RQuadling
ID: 33584091
Thanks for the analysis, Gjslick.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There have been several questions about Large Transaction Log Files in SQL Server 2008, and how to get rid of them when disk space has become critical. This article will explain how to disable full recovery and implement simple recovery that carries…
In SQL Server, when rows are selected from a table, does it retrieve data in the order in which it is inserted?  Many believe this is the case. Let us try to examine for ourselves with an example. To get started, use the following script, wh…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

937 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

4 Experts available now in Live!

Get 1:1 Help Now