asked on

Weeding out duplicate values (using DISTINCT)

Environment: MS SQL Server 2000

Say I have a table like the following:

identity,userid,colour,flavour
1,1,blue,vanilla
2,1,blue,strawberry
3,2,red,chocolate
4,2,red,chocolate
5,3,black,licorice

'SELECT DISTINCT userid, colour, flavour' blocks the display of row 4 (since every element in that row was already returned with row 3), but not row 2 (since the flavour value differs). How do I return only one row per userid value?

I haven't found any working examples of DISTINCT ORDER BY (which sounds like it would be useful).

Thanks in advance,

C

ASKER CERTIFIED SOLUTION

rafrancisco

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Brian Crowe

It sounds like is that you want a single entry per userid/color/flavor combo

SELECT Min(identity), userid, colour, flavour
FROM myTable
GROUP BY userid, colour, flavour

Brian Crowe

oops like i misread...

SELECT *
FROM myTable
WHERE myTable.identity IN (SELECT TOP 1 identity FROM myTable B
WHERE B.userid = myTable.userid
ORDER BY {insert your order here to determine which record is listed first})

stevetheski

for performance reasons i would use the following
SELECT identity, userid, colour, flavour
FROM
(SELECT Min(identity) as identity, userid, colour, flavour
FROM myTable
GROUP BY userid, colour, flavour) as innerTable
order By 1

stevetheski

The GROUP BY clause can be sped up if you follow these suggestion:

Keep the number of rows returned by the query as small as possible.

Keep the number of groupings as few as possible.

reason for my previous post
snippet from http://www.sql-server-performance.com/transact_sql.asp

Don't group redundant columns.

If there is a JOIN in the same SELECT statement that has a GROUP BY, try to rewrite the query to use a subquery instead of using a JOIN. If this is possible, performance will be faster. If you have to use a JOIN, try to make the GROUP BY column from the same table as the column or columns on which the set function is used.