sql server 2012 Understanding DISTINCT

Hi experts:
i am reading about Understanding DISTINCT, but i do not understand

Some queries may improve performance by filtering out duplicates prior to execution of SELECT clause
enrique_aeoAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Paul JacksonSoftware EngineerCommented:
Performance maybe improved because the use of the distinct keyword will stop duplicates being returned in the dataset so if there happens to be duplicate row values returned by the query, using distinct will reduce the size of the dataset returned.
0
HuaMin ChenProblem resolverCommented:
"Distinct" will ensure that no duplicated rows will appear in the resulted recordset, as every row is distinct one!
0
PortletPaulfreelancerCommented:
Please be careful; "select distinct" can be awful for performance, so that sentence must be read "in context".

I believe the author is trying to say:
>> do not just add "select distinct" at the top of every query!
>> to remove unwanted repetition in a result, look deeper. It may be necessary to change from a simple join of a table, and instead join to a "derived table" which removes the cause of the repetition.

see

select distinct” is a rather dull animal with minimal decision making powers.
from Select Distinct is returning duplicates ...

Some Simple SQL Rules to Live By and look for "DISTINCT is *usually* bad"
A good rule of thumb -- if you need a distinct list of values in a single column or two, DISTINCT is the way to go.  But if your results "don't look right" or you see some duplicate rows and can't figure out why, do NOT just add DISTINCT to your SELECT to "fix it" !!  Step back, look at your joins, and re-write your query properly.  Even worse, I've seen people simply add DISTINCT to all their SELECT's right from the start, to preemptively "avoid duplicates".  Not good.  (Side note: If you are a DISTINCT abuser, try adding meaningful primary keys to your tables).
Why I Hate DISTINCT
The effects of DISTINCT in a SQL query
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

enrique_aeoAuthor Commented:
can you give me an example with code t-sql
0
PortletPaulfreelancerCommented:
There is a nice example by Adam Machanic, under the article "Why I hate DISTINCT"

I hate DISTINCT, too, at least improperly used. My rule is to use it at the lowest possible level. Example:

"Which products did each customer buy?"

--Bad:
SELECT DISTINCT p.ProductName, c.CustomerName
FROM Customers c, Products p, Orders o
WHERE o.CustomerId = c.CustomerId AND o.ProductId = p.ProductId

--Much Better:
SELECT
p.ProductName,
c.CustomerName
FROM
(
SELECT DISTINCT
o.CustomerId,
o.ProductId
FROM Orders o
) x
JOIN Customers c ON c.CustomerId = x.CustomerId
JOIN Products p ON p.CustomerId = x.CustomerId

But even in that example it is acknowledged that a "group by" could be even more useful, e.g.
--Much Much Better:
SELECT
      p.ProductName
    , c.CustomerName
    , x.num_of
FROM (
            SELECT
                  o.CustomerId
                , o.ProductId
                , COUNT(*) AS num_of
            FROM Orders o
            GROUP BY
                  o.CustomerId
                , o.ProductId
      ) x
      INNER JOIN Customers c
                  ON c.CustomerId = x.CustomerId
      INNER JOIN Products p
                  ON p.CustomerId = x.CustomerId

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Vikas GargBusiness Intelligence DeveloperCommented:
Hi,

Using distinct means if you use Distinct against any select statement it means that it will return you unique record.

It means combination of all column which is a row will be unique (distinct)

Since using distinct removes duplicate records hence number of record reduces which leads toward performance increase.

We can also remove duplicate by Group by.
0
enrique_aeoAuthor Commented:
use Northwind
go

set statistics io on
SELECT
 p.ProductName,
 c.ContactName
 FROM
 (
 SELECT DISTINCT
 o.CustomerId,
 od.ProductId
 FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID
 ) x
 JOIN Customers c ON c.CustomerId = x.CustomerId
 JOIN Products p ON p.ProductID = x.ProductID
 order by p.ProductName, c.ContactName

 -------
 SELECT DISTINCT p.ProductName, c.ContactName
 from Products p join [Order Details] od on p.ProductID = od.ProductID
join Orders o on od.OrderID = o.OrderID
join Customers c on o.CustomerID = c.CustomerID

set statistics io off
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.