Link to home
Start Free TrialLog in
Avatar of knobbylowboy
knobbylowboy

asked on

ORDER BY items must appear in the select list if SELECT DISTINCT is specified

I have a table where I store a history of what the user has looked at and I want to retrieve the 5 most recent things the user has looked at, without any duplicates. I tried the query below (table/column names changed for clarity) but get an error: "ORDER BY items must appear in the select list if SELECT DISTINCT is specified". If I add the datstamp column to the select list then that row is no longer distinct and I get duplicates of course. How can I rewrite this query so that it does what I want, and as this query will be run very frequently I am looking for a solution that is efficient.

SELECT DISTINCT TOP 5
table1.id, table2.name, table3.name
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = @userid
ORDER BY table1.datestamp

(Running on SQL Server 2005 Express Edition)
Avatar of vinurajr
vinurajr

IF Distinct is there Need to add the selected Item in the Order by thats it.... Clear from the Error
Remove the Distinct Use Group by and Try ...
Did you try this query ? If not, try and let me know.

SELECT DISTINCT TOP 5
table1.id, table2.name, table3.name
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = @userid
ORDER BY table1.id, table2.name, table3.name

~Raj
hi..
try this..


SELECT  table1.id AS t1ID, table2.name AS t2Name, table3.name AS t3Name
INTO	#Temp
FROM	table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE	table1.userid = @userid
ORDER BY table1.datestamp

SELECT	DISTINCT TOP 5 t1ID, t2Name, t3Name
FROM	#Temp

DROP TABLE #Temp

Open in new window

Have a try.

~Raj
SELECT TOP 5
table1.id, table2.name, table3.name
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = @userid
group by table1.id, table2.name, table3.name
ORDER BY table1.id, table2.name, table3.name

Open in new window

Avatar of knobbylowboy

ASKER

vinurajr: Sorry but I didn't understand your suggested solution, perhaps a sample might help?

RajkumarGS: Problem is that I need the results sorted by the 'datestamp' field.

rushShah: This solution should work but I was looking for a more efficient method than using temporary tables.
just try that query..it should be efficient..
Match the Excecution plans..
Avatar of Sharath S
This will do what you are looking for.

SELECT TOP 5 table1.id, table2.name, table3.name
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = @userid
GROUP BY table1.id, table2.name, table3.name
ORDER BY min(table1.datestamp)

Open in new window

This should be efficient:
SELECT id, t2name, t3name
FROM (
SELECT table1.id, table2.name t2name, table3.name t3name, ROW_NUMBER() over ( PARTITION by table1.id, table2.name, table3.name ORDER BY table1.datestamp) rnum
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = @userid) temp
WHERE rnum <=5

Open in new window

you please try this code:

SELECT TOP 5
table1.id, table2.name, table3.name
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = @userid
group by table1.id, table2.name, table3.name
ORDER BY table1.datestamp
rushShah: I tried this method but as soon as I add DISTINCT I lose the sort order, eg:
SELECT      AgencyID
INTO    #Temp
FROM    tblRecent
WHERE   AdminUserID = @AdminUserID
ORDER BY DateTimeStamp DESC
SELECT  DISTINCT TOP 5 AgencyID
FROM    #Temp
DROP TABLE #Temp

Sharath_123/vinurajr: I tried your method but the sort order is also not applied, eg (changed to MAX as wanted newest to oldest, but MIN doesn't work either):
SELECT TOP 5 AgencyID
FROM tblRecent
WHERE AdminUserID = @AdminUserID
GROUP BY AgencyID
ORDER BY MAX(DateTimeStamp)

rrjegan17: I also tried your method but the sort order is not being applied, eg:
SELECT AgencyID
FROM (
SELECT AgencyID, ROW_NUMBER() over ( PARTITION by AgencyID ORDER BY DateTimeStamp DESC) rnum
FROM tblRecent
WHERE AdminUserID = @AdminUserID) temp
WHERE rnum <=5

I have simplified my code and table for ease of testing. tblRecent contains four columns: RecentID, DateTimeStamp, AdminUserID and AgencyID.
My recordset contains four rows:
1      16/02/2010 12:27:03 PM      2      1
2      16/02/2010 12:27:08 PM      2      24
3      16/02/2010 12:27:12 PM      2      25
4      19/02/2010 2:44:03 PM      2      22

I would expect the order to be newest to oldest so would return AgencyID in order 22, 25, 24, 1, but all of the above test code returns 1, 22, 24, 25 regardless of min/max or asc/desc.
hi..try this..


SELECT  AgencyID, Rank() over ( PARTITION by AdminUserID ORDER BY DateTimeStamp DESC) rank
INTO    #Temp
FROM    @t
WHERE   AdminUserID = @AdminUserID
ORDER BY DateTimeStamp DESC

SELECT	* FROM #Temp

SELECT  DISTINCT TOP 5 AgencyID,  Rank() over ( ORDER BY rank DESC) num
FROM    #Temp
Order by num DESC

DROP TABLE #Temp

Open in new window

rushShah: the sort order is correct but once rows are added with the same AgencyID then duplicate rows are still included in the recordset as DISTINCT does not remove the duplicate rows because of the unique Rank().
ok..

try this..
I don't know whether it is Efficient or not..
There should be some other way to do this, but this one gives required result..
SELECT  AgencyID, Rank() over ( PARTITION by AdminUserID ORDER BY DateTimeStamp DESC) rank
INTO    #Temp
FROM    tblRecent
WHERE   AdminUserID = @AdminUserID
ORDER BY DateTimeStamp DESC

SELECT	DISTINCT TOP 5 AgencyID, sortOrder
FROM	(
SELECT  AgencyID, Rank() over ( ORDER BY rank DESC) sortOrder, Rank() over (PARTITION by AgencyID ORDER BY rank DESC) num
FROM    #Temp
) t
WHERE	num=1
Order by sortOrder DESC

DROP TABLE #Temp

Open in new window

Hi knobbylowboy,

I am not sure whether I understood your question and I am doing right.

Anyway check this script.

Attached the table structure, sample data, full data view and the suggested result.

Check the last query, whether it is right ?

~Raj
create table table1
(
	id	int,
	table2id	int,
	table3id	int,
	userid		int,
	datestamp	datetime
)

create table table2
(
	id	int,
	name	varchar(20),
)

create table table3
(
	id	int,
	name	varchar(20),
)

-- SAMPLE DATA
insert into table2 values(1, 'autoriksha')
insert into table2 values(2, 'boat')
insert into table2 values(3, 'plane')
insert into table2 values(4, 'cycle')
insert into table2 values(5, 'bus')
insert into table2 values(6, 'car')

insert into table3 values(1, 'acdsee')
insert into table3 values(2, 'avant')
insert into table3 values(3, 'bitdefender')
insert into table3 values(4, 'norton')
insert into table3 values(5, 'windows xp')
insert into table3 values(6, 'vista')

insert into table1 values(1, 1, 1, 1, '1/3/2010 6:44 AM')
insert into table1 values(1, 1, 1, 1, '1/3/2010 7:00 AM')
insert into table1 values(1, 2, 3, 1, '1/3/2010 7:46 AM')
insert into table1 values(1, 3, 2, 1, '1/3/2010 8:14 AM')
insert into table1 values(1, 1, 1, 1, '1/3/2010 9:33 AM')
insert into table1 values(1, 3, 2, 1, '1/4/2010 9:01 AM')
insert into table1 values(1, 2, 3, 1, '1/4/2010 10:01 AM')
insert into table1 values(1, 1, 2, 1, '1/4/2010 10:01 AM')
insert into table1 values(1, 1, 2, 1, '1/4/2010 10:41 AM')
insert into table1 values(1, 2, 2, 1, '1/4/2010 10:01 AM')
insert into table1 values(1, 2, 2, 1, '1/5/2010 9:13 AM')
insert into table1 values(1, 3, 3, 1, '1/6/2010 11:11 AM')
insert into table1 values(1, 3, 3, 1, '1/7/2010 12:41 AM')

-- COMPLETE DATA
SELECT DISTINCT
table1.datestamp, table1.id, table2.name, table3.name
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = 1
ORDER BY table1.id, table2.name, table3.name

-- FILTERED VIEW - TOP 5 ONLY
SELECT 
table1.id, table2.name, table3.name, MAX(table1.datestamp)
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = 1
GROUP BY table1.id, table2.name, table3.name
ORDER BY MAX(table1.datestamp)

Open in new window

For Top 5 records only, I missed it.
~Raj
SELECT TOP 5
table1.id, table2.name, table3.name, MAX(table1.datestamp)
FROM table1
LEFT OUTER JOIN table2 ON table2.id = table1.table2id
LEFT OUTER JOIN table3 ON table3.id = table1.table3id
WHERE table1.userid = 1
GROUP BY table1.id, table2.name, table3.name
ORDER BY MAX(table1.datestamp)

Open in new window

>> I also tried your method but the sort order is not being applied, eg:

This would do the sorting.
SELECT AgencyID
FROM ( 
SELECT AgencyID, ROW_NUMBER() over ( PARTITION by AgencyID ORDER BY DateTimeStamp DESC) rnum 
FROM tblRecent 
WHERE AdminUserID = @AdminUserID) temp 
WHERE rnum <=5
ORDER BY AgencyID

Open in new window

>> Sharath_123/vinurajr: I tried your method but the sort order is also not applied, eg (changed to MAX as wanted newest to oldest, but MIN doesn't work either):

What do you mean by it does not work? I tested it with sample data before posting. Better provide some sample data and the expected result.
Ok here is a simplified sample table and data to test with:

create table testtable
(
id int,
datestamp datetime
)
insert into testtable values(1, '2010-02-11')
insert into testtable values(2, '2010-02-12')
insert into testtable values(3, '2010-02-13')
insert into testtable values(4, '2010-02-14')
insert into testtable values(5, '2010-02-15')
insert into testtable values(5, '2010-02-16')
insert into testtable values(4, '2010-02-17')

So the intention is to return a recordset of a maximum of 5 records (id only), with no duplicates, ordered by newest to oldest. The desired result is 4, 5, 3, 2, 1.

Ideally the query would be "select distinct top 5 id from testtable order by datestamp desc" but that doesn't work of course.

I tried all of the above suggested solutions and none produced the desired result.
try this.

select id from (
select id,max(datestamp) as datestamp
  from testable
 group by id) as t1
 order by datestamp
Try this:
SELECT AgencyID 
FROM (  
SELECT AgencyID, DateTimeStamp, ROW_NUMBER() over ( PARTITION by AgencyID ORDER BY DateTimeStamp DESC) rnum  
FROM tblRecent  
WHERE AdminUserID = @AdminUserID) temp  
WHERE rnum <=5 
ORDER BY DateTimeStamp desc

-- For your modified table run the one below 

SELECT ID 
FROM (  
SELECT ID, datestamp, ROW_NUMBER() over ( PARTITION by ID ORDER BY datestamp DESC) rnum  
FROM testtable ) temp  
WHERE rnum <=5 
ORDER BY datestamp desc

Open in new window

Just to confirm... Did you check my sample table & data with the query I provided. (my previous two posts)

Here is the query for you sample data. I tested it against your table and it returns the suggested result.

~Raj
SELECT ID, DATESTAMP FROM
(SELECT ID, MAX(datestamp) AS DATESTAMP
	FROM testtable
		GROUP BY ID) AS A
ORDER BY DATESTAMP DESC

Open in new window

If we select the ID alone (4, 5, 3, 2, 1), it will lose the order (1, 2, 3, 4, 5) which was selected inside the subquery. So better to select datetime field also with ID. (Previous Query)

~Raj
ASKER CERTIFIED SOLUTION
Avatar of Rajkumar Gs
Rajkumar Gs
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sharath_123: your solution returned 1, 2, 3, 4, 5 - wrong sort order.
rrjegan17: your solution returned 4, 5, 5, 4, 3, 2, 1 - duplication and more than 5 records.
RajkumarGS: your solution works, thank you.
Needed to add "TOP 5" to limit recordset to 5 rows.
yes, I missed DESC.
Small Mistake:

SELECT ID
FROM (  
SELECT ID, datestamp, ROW_NUMBER() over ( PARTITION by ID ORDER BY datestamp DESC) rnum  
FROM testtable ) temp  
WHERE rnum =1
ORDER BY datestamp desc