Link to home
Start Free TrialLog in
Avatar of engineroom
engineroom

asked on

Help with MS SQL statement using TOP and DISTINCT

Hey all, I have a sql statement that selects TOP 7 records from a table but they are not unique. I do not want to have duplicate people. Let me post the sql and then explain.

I have 2 tables, 1 called artists and 1 called images. I'm trying to get the TOP 7 random Artists to show up. Then i joined the IMAGES table to grab one of the artists' images (randomly) and the corresponding ITEMID for that image that was grabbed. With the code provided i get random records but I get duplicate Artists. How can i make it so the TOP 7 artists are unique and randomly chosen with each artists random piece of art and the art's corresponding ITEMID? thanks all!

er
SELECT TOP 7 artists.artistID, artist, imageID, itemID
FROM Artists
INNER JOIN Images
ON (Images.artistID=artists.artistID)
GROUP BY artists.artistID, artist, imageID, itemID
ORDER BY NEWID()

Open in new window

Avatar of ralmada
ralmada
Flag of Canada image

Try like this:
Please note that I'm assuming that ImageID is unique in Images table. If not change it for Item ID. I have attached both versions:

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID 
FROM Artists t1
INNER JOIN (
			select * from Images a
			where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())
			) t2 on t1.artistID=t2.artistID


-- OR

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID 
FROM Artists t1
INNER JOIN (
			select * from Images a
			where ItemID = (select top 1 ItemID from Images where artistID = a.ArtistID order by newid())
			) t2 on t1.artistID=t2.artistID

Open in new window

Avatar of engineroom
engineroom

ASKER

hey ralmada.... i just ran the query and i'm getting even more duplicates now. You are correct, the imageID is unique in the images table. Don't know what's going on there.

er
Try this:
SELECT TMP1.artistID, TMP1.artist, TMP1.imageID, C.itemID
FROM(
SELECT TMP.artistID, TMP.artist, (SELECT TOP 1 imageID FROM Images B WHERE B.artistID = TmpArtist.artistID) as imageID
FROM (
SELECT TOP 7 TmpArtist.artistID, TmpArtist.artist 
FROM (SELECT DISTINCT artistID, artist
FROM Artists) TmpArtist) TMP) TMP1, Images C
WHERE
TMP1.artistID = C.artistID
AND TMP1.imageID = c.imageID
ORDER BY NEWID();

Open in new window

Avatar of Goodangel Matope
Hmmm....

Havent tested this, but try give it a shot
SELECT TOP 7 artists.artistID, artist, imageID, itemID
FROM(
     SELECT distinct artists.artistID, artist, imageID, itemID
     FROM Artists
     INNER JOIN Images
     ON (Images.artistID=artists.artistID)
     GROUP BY artists.artistID, artist, imageID, itemID
     ORDER BY NEWID()
) As ArtistData

Open in new window

@sameer
I got this error:
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "TmpArtist.artistID" could not be bound.

@goodangel
I got this error:
Msg 1033, Level 15, State 1, Line 9
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

Please check the aliases. This query should definitively work. If not please post the exact query you're running.

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID  
FROM Artists t1 
INNER JOIN ( 
                        select * from Images a 
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid()) 
                        ) t2 on t1.artistID=t2.artistID

Open in new window


Aside from the exact query you're running, please post some data that you're getting from the query indicating where the duplicates are.
In the mean time here's an example with AdventureWorks.

If you run
select * from Sales.SalesOrderHeader
select * from Sales.SalesOrderDetail

you will get 31465 rows in SalesOrderHeader and 121317 rows in the details one.
So if you just join them you will get 121317 rows (one to many relationship):

select * from Sales.SalesOrderHeader a
inner join Sales.SalesOrderDetail b on a.SalesOrderID = b.SalesOrderID

However if you do like this:
select * from Sales.SalesOrderHeader t1
inner join (select * from Sales.SalesOrderDetail a
			where SalesOrderDetailID = (select top 1 SalesOrderDetailID from Sales.SalesOrderDetail where SalesOrderID = a.SalesOrderID order by newid())
) t2 on t1.SalesOrderID = t2.SalesOrderID

You will get 31465 rows, which is the number of rows in SalesOrderHeader. So basically you're forcing a one-to-one join by selecting distinct values in SalesOrderDetails for each SalesOrderID.

Open in new window

I ran the exact query that you gave me and it worked but gave me duplicate results.. Attached is the sql that i pasted.

Here are the results:

231      HENNING JORGENSEN      23      3
232      MIKE RUBENDALL      335      28
232      MIKE RUBENDALL      49      18
233      TREVOR MCSTAY      51      34
233      TREVOR MCSTAY      55      34
233      TREVOR MCSTAY      56      34
234      TIM LEHI      64      35
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID  
FROM Artists t1 
INNER JOIN ( 
                        select * from Images a 
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid()) 
                        ) t2 on t1.artistID=t2.artistID

Open in new window

Ok, so you have duplicates in the Artists table. See attached:
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID   
FROM (select distinct ArtistID, Artist from Artists) t1  
INNER JOIN (  
                        select * from Images a  
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())  
                        ) t2 on t1.artistID=t2.artistID

Open in new window

Still returns duplicate artists :(

er
Can you post your tables structure? And some sample data from both tables as well as the expected result. Please include samples where duplicates are found in both tables.
Also, please advise which are unique columns on both tables.
Maybe you need to join both tables with another column aside from Artist.
So far with the information I have I can only assume that you have duplicates Artist IDs with different names in the artist table, so I would try like this:

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID    
FROM (select ArtistID, max(Artist) Artist from Artists group by ArtistID) t1   
INNER JOIN (   
                        select * from Images a   
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())   
                        ) t2 on t1.artistID=t2.artistID

Open in new window

ok.

Artists Table
-----------------------
ArtistID (unique)
Artist

Images Table
-----------------------
ImageID (unique)
artistID
itemID

Sample Data From Artists
-----------------------------------
231      HENNING JORGENSEN
232      MIKE RUBENDALL
233      TREVOR MCSTAY
234      TIM LEHI

Sample Data From Images
-----------------------------------
1      231      1
2      231      1
3      231      1
4      231      1
5      231      1
6      231      1

There can be multiple images for 1 Artist. For example, in the images table, imageID 1-43 is all from Artist 231 (HENNING JORGENSEN)
Ok, it seems that there's a bug with newid() in combination with TOP in SQL server. Check this thread for details.
http://www.sqlmag.com/Article/ArticleID/93895/sql_server_93895.html 
So I would change the query to this:
 
 

select * from artists a
inner join (	select * from (select *, row_number() over (partition by artistID order by newid()) rn from images) t1
			where rn = 1
			) b on a.artistID = b.artistID

Open in new window

oops, missed one part:
select TOP 7 a.artistID, a.artist, b.imageID, b.itemID      from artists a 
inner join (    select * from (select *, row_number() over (partition by artistID order by newid()) rn from images) t1 
                        where rn = 1 
                        ) b on a.artistID = b.artistID

Open in new window

Try this, there was a typo.
SELECT TMP1.artistID, TMP1.artist, TMP1.imageID, C.itemID
FROM(
SELECT TMP.artistID, TMP.artist, (SELECT TOP 1 imageID FROM Images B WHERE B.artistID = Tmp.artistID) as imageID
FROM (
SELECT TOP 7 TmpArtist.artistID, TmpArtist.artist 
FROM (SELECT DISTINCT artistID, artist
FROM Artists) as TmpArtist) as TMP) as TMP1, Images C
WHERE
TMP1.artistID = C.artistID
AND TMP1.imageID = c.imageID
ORDER BY NEWID();

Open in new window

Now i'm getting the same guys over and over with the Same image. When i execute the query it just puts them in a different order. i.e.:

231      HENNING JORGENSEN      1      1
237      CHRIS TREVINO      179      48
234      TIM LEHI      58      35
235      PAUL JEFFRIES      87      38
233      TREVOR MCSTAY      50      34
232      MIKE RUBENDALL      44      18
236      BRETT SCHWINDT      139      44
-----------------------------------------------
232      MIKE RUBENDALL      44      18
235      PAUL JEFFRIES      87      38
236      BRETT SCHWINDT      139      44
233      TREVOR MCSTAY      50      34
231      HENNING JORGENSEN      1      1
237      CHRIS TREVINO      179      48
234      TIM LEHI      58      35

You see that the imageID is always the same now.
Have you tried my solution?
see comment http:a#26093273 . I have recreated it in my test environment and is working properly.
ASKER CERTIFIED SOLUTION
Avatar of ralmada
ralmada
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@ralmada:

I didn't even see this post! Give me a sec. So far it seems to work perfectly. i'll get back to you in a couple of minutes on it. Can you please explain the code to me?
Sure basically I'm using the row_number() function which will create a temporary "random" id (... order by newid()) for each image. This temporary id will start over for ImagesID belonging to a different ArtistID (...partition by ArtistID ...). So basically we are generating a random number for each image withing a partition of ArtistID.
then by just filtering it (where rn = 1) we are making sure that only one ImageID per ArtistID is returned in the subquery.
With that you can join it with the Artists table.
More info on the row_number
http://msdn.microsoft.com/en-us/library/ms186734.aspx 
 Hope this helps,
ralmada
Thanks ralmada. This does seem to work. If something goes wrong i'll let  you know.
Although i have no idea how you did it (even though you explained it) you helped me out a lot. Thanks!!