• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 383
  • Last Modified:

sql server 2012 Understanding DISTINCT

Hi experts:
i am reading about Understanding DISTINCT, but i do not understand

Some queries may improve performance by filtering out duplicates prior to execution of SELECT clause
0
enrique_aeo
Asked:
enrique_aeo
3 Solutions
 
Paul JacksonCommented:
Performance maybe improved because the use of the distinct keyword will stop duplicates being returned in the dataset so if there happens to be duplicate row values returned by the query, using distinct will reduce the size of the dataset returned.
0
 
HuaMinChenBusiness AnalystCommented:
"Distinct" will ensure that no duplicated rows will appear in the resulted recordset, as every row is distinct one!
0
 
PortletPaulCommented:
Please be careful; "select distinct" can be awful for performance, so that sentence must be read "in context".

I believe the author is trying to say:
>> do not just add "select distinct" at the top of every query!
>> to remove unwanted repetition in a result, look deeper. It may be necessary to change from a simple join of a table, and instead join to a "derived table" which removes the cause of the repetition.

see

select distinct” is a rather dull animal with minimal decision making powers.
from Select Distinct is returning duplicates ...

Some Simple SQL Rules to Live By and look for "DISTINCT is *usually* bad"
A good rule of thumb -- if you need a distinct list of values in a single column or two, DISTINCT is the way to go.  But if your results "don't look right" or you see some duplicate rows and can't figure out why, do NOT just add DISTINCT to your SELECT to "fix it" !!  Step back, look at your joins, and re-write your query properly.  Even worse, I've seen people simply add DISTINCT to all their SELECT's right from the start, to preemptively "avoid duplicates".  Not good.  (Side note: If you are a DISTINCT abuser, try adding meaningful primary keys to your tables).
Why I Hate DISTINCT
The effects of DISTINCT in a SQL query
0
Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

 
enrique_aeoAuthor Commented:
can you give me an example with code t-sql
0
 
PortletPaulCommented:
There is a nice example by Adam Machanic, under the article "Why I hate DISTINCT"

I hate DISTINCT, too, at least improperly used. My rule is to use it at the lowest possible level. Example:

"Which products did each customer buy?"

--Bad:
SELECT DISTINCT p.ProductName, c.CustomerName
FROM Customers c, Products p, Orders o
WHERE o.CustomerId = c.CustomerId AND o.ProductId = p.ProductId

--Much Better:
SELECT
p.ProductName,
c.CustomerName
FROM
(
SELECT DISTINCT
o.CustomerId,
o.ProductId
FROM Orders o
) x
JOIN Customers c ON c.CustomerId = x.CustomerId
JOIN Products p ON p.CustomerId = x.CustomerId

But even in that example it is acknowledged that a "group by" could be even more useful, e.g.
--Much Much Better:
SELECT
      p.ProductName
    , c.CustomerName
    , x.num_of
FROM (
            SELECT
                  o.CustomerId
                , o.ProductId
                , COUNT(*) AS num_of
            FROM Orders o
            GROUP BY
                  o.CustomerId
                , o.ProductId
      ) x
      INNER JOIN Customers c
                  ON c.CustomerId = x.CustomerId
      INNER JOIN Products p
                  ON p.CustomerId = x.CustomerId

Open in new window

0
 
Vikas GargBusiness Intelligence DeveloperCommented:
Hi,

Using distinct means if you use Distinct against any select statement it means that it will return you unique record.

It means combination of all column which is a row will be unique (distinct)

Since using distinct removes duplicate records hence number of record reduces which leads toward performance increase.

We can also remove duplicate by Group by.
0
 
enrique_aeoAuthor Commented:
use Northwind
go

set statistics io on
SELECT
 p.ProductName,
 c.ContactName
 FROM
 (
 SELECT DISTINCT
 o.CustomerId,
 od.ProductId
 FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID
 ) x
 JOIN Customers c ON c.CustomerId = x.CustomerId
 JOIN Products p ON p.ProductID = x.ProductID
 order by p.ProductName, c.ContactName

 -------
 SELECT DISTINCT p.ProductName, c.ContactName
 from Products p join [Order Details] od on p.ProductID = od.ProductID
join Orders o on od.OrderID = o.OrderID
join Customers c on o.CustomerID = c.CustomerID

set statistics io off
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now