engineroom
asked on
Help with MS SQL statement using TOP and DISTINCT
Hey all, I have a sql statement that selects TOP 7 records from a table but they are not unique. I do not want to have duplicate people. Let me post the sql and then explain.
I have 2 tables, 1 called artists and 1 called images. I'm trying to get the TOP 7 random Artists to show up. Then i joined the IMAGES table to grab one of the artists' images (randomly) and the corresponding ITEMID for that image that was grabbed. With the code provided i get random records but I get duplicate Artists. How can i make it so the TOP 7 artists are unique and randomly chosen with each artists random piece of art and the art's corresponding ITEMID? thanks all!
er
I have 2 tables, 1 called artists and 1 called images. I'm trying to get the TOP 7 random Artists to show up. Then i joined the IMAGES table to grab one of the artists' images (randomly) and the corresponding ITEMID for that image that was grabbed. With the code provided i get random records but I get duplicate Artists. How can i make it so the TOP 7 artists are unique and randomly chosen with each artists random piece of art and the art's corresponding ITEMID? thanks all!
er
SELECT TOP 7 artists.artistID, artist, imageID, itemID
FROM Artists
INNER JOIN Images
ON (Images.artistID=artists.artistID)
GROUP BY artists.artistID, artist, imageID, itemID
ORDER BY NEWID()
ASKER
hey ralmada.... i just ran the query and i'm getting even more duplicates now. You are correct, the imageID is unique in the images table. Don't know what's going on there.
er
er
Try this:
SELECT TMP1.artistID, TMP1.artist, TMP1.imageID, C.itemID
FROM(
SELECT TMP.artistID, TMP.artist, (SELECT TOP 1 imageID FROM Images B WHERE B.artistID = TmpArtist.artistID) as imageID
FROM (
SELECT TOP 7 TmpArtist.artistID, TmpArtist.artist
FROM (SELECT DISTINCT artistID, artist
FROM Artists) TmpArtist) TMP) TMP1, Images C
WHERE
TMP1.artistID = C.artistID
AND TMP1.imageID = c.imageID
ORDER BY NEWID();
Hmmm....
Havent tested this, but try give it a shot
Havent tested this, but try give it a shot
SELECT TOP 7 artists.artistID, artist, imageID, itemID
FROM(
SELECT distinct artists.artistID, artist, imageID, itemID
FROM Artists
INNER JOIN Images
ON (Images.artistID=artists.artistID)
GROUP BY artists.artistID, artist, imageID, itemID
ORDER BY NEWID()
) As ArtistData
ASKER
@sameer
I got this error:
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "TmpArtist.artistID" could not be bound.
@goodangel
I got this error:
Msg 1033, Level 15, State 1, Line 9
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
I got this error:
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "TmpArtist.artistID" could not be bound.
@goodangel
I got this error:
Msg 1033, Level 15, State 1, Line 9
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
Please check the aliases. This query should definitively work. If not please post the exact query you're running.
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID
FROM Artists t1
INNER JOIN (
select * from Images a
where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())
) t2 on t1.artistID=t2.artistID
Aside from the exact query you're running, please post some data that you're getting from the query indicating where the duplicates are.
In the mean time here's an example with AdventureWorks.
If you run
select * from Sales.SalesOrderHeader
select * from Sales.SalesOrderDetail
you will get 31465 rows in SalesOrderHeader and 121317 rows in the details one.
So if you just join them you will get 121317 rows (one to many relationship):
select * from Sales.SalesOrderHeader a
inner join Sales.SalesOrderDetail b on a.SalesOrderID = b.SalesOrderID
However if you do like this:
select * from Sales.SalesOrderHeader t1
inner join (select * from Sales.SalesOrderDetail a
where SalesOrderDetailID = (select top 1 SalesOrderDetailID from Sales.SalesOrderDetail where SalesOrderID = a.SalesOrderID order by newid())
) t2 on t1.SalesOrderID = t2.SalesOrderID
You will get 31465 rows, which is the number of rows in SalesOrderHeader. So basically you're forcing a one-to-one join by selecting distinct values in SalesOrderDetails for each SalesOrderID.
ASKER
I ran the exact query that you gave me and it worked but gave me duplicate results.. Attached is the sql that i pasted.
Here are the results:
231 HENNING JORGENSEN 23 3
232 MIKE RUBENDALL 335 28
232 MIKE RUBENDALL 49 18
233 TREVOR MCSTAY 51 34
233 TREVOR MCSTAY 55 34
233 TREVOR MCSTAY 56 34
234 TIM LEHI 64 35
Here are the results:
231 HENNING JORGENSEN 23 3
232 MIKE RUBENDALL 335 28
232 MIKE RUBENDALL 49 18
233 TREVOR MCSTAY 51 34
233 TREVOR MCSTAY 55 34
233 TREVOR MCSTAY 56 34
234 TIM LEHI 64 35
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID
FROM Artists t1
INNER JOIN (
select * from Images a
where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())
) t2 on t1.artistID=t2.artistID
Ok, so you have duplicates in the Artists table. See attached:
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID
FROM (select distinct ArtistID, Artist from Artists) t1
INNER JOIN (
select * from Images a
where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())
) t2 on t1.artistID=t2.artistID
ASKER
Still returns duplicate artists :(
er
er
Can you post your tables structure? And some sample data from both tables as well as the expected result. Please include samples where duplicates are found in both tables.
Also, please advise which are unique columns on both tables.
Maybe you need to join both tables with another column aside from Artist.
So far with the information I have I can only assume that you have duplicates Artist IDs with different names in the artist table, so I would try like this:
Also, please advise which are unique columns on both tables.
Maybe you need to join both tables with another column aside from Artist.
So far with the information I have I can only assume that you have duplicates Artist IDs with different names in the artist table, so I would try like this:
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID
FROM (select ArtistID, max(Artist) Artist from Artists group by ArtistID) t1
INNER JOIN (
select * from Images a
where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())
) t2 on t1.artistID=t2.artistID
ASKER
ok.
Artists Table
-----------------------
ArtistID (unique)
Artist
Images Table
-----------------------
ImageID (unique)
artistID
itemID
Sample Data From Artists
-------------------------- ---------
231 HENNING JORGENSEN
232 MIKE RUBENDALL
233 TREVOR MCSTAY
234 TIM LEHI
Sample Data From Images
-------------------------- ---------
1 231 1
2 231 1
3 231 1
4 231 1
5 231 1
6 231 1
There can be multiple images for 1 Artist. For example, in the images table, imageID 1-43 is all from Artist 231 (HENNING JORGENSEN)
Artists Table
-----------------------
ArtistID (unique)
Artist
Images Table
-----------------------
ImageID (unique)
artistID
itemID
Sample Data From Artists
--------------------------
231 HENNING JORGENSEN
232 MIKE RUBENDALL
233 TREVOR MCSTAY
234 TIM LEHI
Sample Data From Images
--------------------------
1 231 1
2 231 1
3 231 1
4 231 1
5 231 1
6 231 1
There can be multiple images for 1 Artist. For example, in the images table, imageID 1-43 is all from Artist 231 (HENNING JORGENSEN)
Ok, it seems that there's a bug with newid() in combination with TOP in SQL server. Check this thread for details.
http://www.sqlmag.com/Article/ArticleID/93895/sql_server_93895.html
So I would change the query to this:
http://www.sqlmag.com/Article/ArticleID/93895/sql_server_93895.html
So I would change the query to this:
select * from artists a
inner join ( select * from (select *, row_number() over (partition by artistID order by newid()) rn from images) t1
where rn = 1
) b on a.artistID = b.artistID
oops, missed one part:
select TOP 7 a.artistID, a.artist, b.imageID, b.itemID from artists a
inner join ( select * from (select *, row_number() over (partition by artistID order by newid()) rn from images) t1
where rn = 1
) b on a.artistID = b.artistID
Try this, there was a typo.
SELECT TMP1.artistID, TMP1.artist, TMP1.imageID, C.itemID
FROM(
SELECT TMP.artistID, TMP.artist, (SELECT TOP 1 imageID FROM Images B WHERE B.artistID = Tmp.artistID) as imageID
FROM (
SELECT TOP 7 TmpArtist.artistID, TmpArtist.artist
FROM (SELECT DISTINCT artistID, artist
FROM Artists) as TmpArtist) as TMP) as TMP1, Images C
WHERE
TMP1.artistID = C.artistID
AND TMP1.imageID = c.imageID
ORDER BY NEWID();
ASKER
Now i'm getting the same guys over and over with the Same image. When i execute the query it just puts them in a different order. i.e.:
231 HENNING JORGENSEN 1 1
237 CHRIS TREVINO 179 48
234 TIM LEHI 58 35
235 PAUL JEFFRIES 87 38
233 TREVOR MCSTAY 50 34
232 MIKE RUBENDALL 44 18
236 BRETT SCHWINDT 139 44
-------------------------- ---------- ---------- -
232 MIKE RUBENDALL 44 18
235 PAUL JEFFRIES 87 38
236 BRETT SCHWINDT 139 44
233 TREVOR MCSTAY 50 34
231 HENNING JORGENSEN 1 1
237 CHRIS TREVINO 179 48
234 TIM LEHI 58 35
You see that the imageID is always the same now.
231 HENNING JORGENSEN 1 1
237 CHRIS TREVINO 179 48
234 TIM LEHI 58 35
235 PAUL JEFFRIES 87 38
233 TREVOR MCSTAY 50 34
232 MIKE RUBENDALL 44 18
236 BRETT SCHWINDT 139 44
--------------------------
232 MIKE RUBENDALL 44 18
235 PAUL JEFFRIES 87 38
236 BRETT SCHWINDT 139 44
233 TREVOR MCSTAY 50 34
231 HENNING JORGENSEN 1 1
237 CHRIS TREVINO 179 48
234 TIM LEHI 58 35
You see that the imageID is always the same now.
Have you tried my solution?
see comment http:a#26093273 . I have recreated it in my test environment and is working properly.
see comment http:a#26093273 . I have recreated it in my test environment and is working properly.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
@ralmada:
I didn't even see this post! Give me a sec. So far it seems to work perfectly. i'll get back to you in a couple of minutes on it. Can you please explain the code to me?
I didn't even see this post! Give me a sec. So far it seems to work perfectly. i'll get back to you in a couple of minutes on it. Can you please explain the code to me?
Sure basically I'm using the row_number() function which will create a temporary "random" id (... order by newid()) for each image. This temporary id will start over for ImagesID belonging to a different ArtistID (...partition by ArtistID ...). So basically we are generating a random number for each image withing a partition of ArtistID.
then by just filtering it (where rn = 1) we are making sure that only one ImageID per ArtistID is returned in the subquery.
With that you can join it with the Artists table.
More info on the row_number
http://msdn.microsoft.com/en-us/library/ms186734.aspx
Hope this helps,
ralmada
then by just filtering it (where rn = 1) we are making sure that only one ImageID per ArtistID is returned in the subquery.
With that you can join it with the Artists table.
More info on the row_number
http://msdn.microsoft.com/en-us/library/ms186734.aspx
Hope this helps,
ralmada
ASKER
Thanks ralmada. This does seem to work. If something goes wrong i'll let you know.
ASKER
Although i have no idea how you did it (even though you explained it) you helped me out a lot. Thanks!!
Please note that I'm assuming that ImageID is unique in Images table. If not change it for Item ID. I have attached both versions:
Open in new window