?
Solved

Help with MS SQL statement using TOP and DISTINCT

Posted on 2009-12-19
22
Medium Priority
?
535 Views
Last Modified: 2012-05-08
Hey all, I have a sql statement that selects TOP 7 records from a table but they are not unique. I do not want to have duplicate people. Let me post the sql and then explain.

I have 2 tables, 1 called artists and 1 called images. I'm trying to get the TOP 7 random Artists to show up. Then i joined the IMAGES table to grab one of the artists' images (randomly) and the corresponding ITEMID for that image that was grabbed. With the code provided i get random records but I get duplicate Artists. How can i make it so the TOP 7 artists are unique and randomly chosen with each artists random piece of art and the art's corresponding ITEMID? thanks all!

er
SELECT TOP 7 artists.artistID, artist, imageID, itemID
FROM Artists
INNER JOIN Images
ON (Images.artistID=artists.artistID)
GROUP BY artists.artistID, artist, imageID, itemID
ORDER BY NEWID()

Open in new window

0
Comment
Question by:engineroom
  • 10
  • 9
  • 2
  • +1
22 Comments
 
LVL 41

Expert Comment

by:ralmada
ID: 26089244
Try like this:
Please note that I'm assuming that ImageID is unique in Images table. If not change it for Item ID. I have attached both versions:

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID 
FROM Artists t1
INNER JOIN (
			select * from Images a
			where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())
			) t2 on t1.artistID=t2.artistID


-- OR

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID 
FROM Artists t1
INNER JOIN (
			select * from Images a
			where ItemID = (select top 1 ItemID from Images where artistID = a.ArtistID order by newid())
			) t2 on t1.artistID=t2.artistID

Open in new window

0
 
LVL 3

Author Comment

by:engineroom
ID: 26089478
hey ralmada.... i just ran the query and i'm getting even more duplicates now. You are correct, the imageID is unique in the images table. Don't know what's going on there.

er
0
 
LVL 13

Expert Comment

by:sameer2010
ID: 26089667
Try this:
SELECT TMP1.artistID, TMP1.artist, TMP1.imageID, C.itemID
FROM(
SELECT TMP.artistID, TMP.artist, (SELECT TOP 1 imageID FROM Images B WHERE B.artistID = TmpArtist.artistID) as imageID
FROM (
SELECT TOP 7 TmpArtist.artistID, TmpArtist.artist 
FROM (SELECT DISTINCT artistID, artist
FROM Artists) TmpArtist) TMP) TMP1, Images C
WHERE
TMP1.artistID = C.artistID
AND TMP1.imageID = c.imageID
ORDER BY NEWID();

Open in new window

0
Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

 
LVL 11

Expert Comment

by:Goodangel Matope
ID: 26089716
Hmmm....

Havent tested this, but try give it a shot
SELECT TOP 7 artists.artistID, artist, imageID, itemID
FROM(
     SELECT distinct artists.artistID, artist, imageID, itemID
     FROM Artists
     INNER JOIN Images
     ON (Images.artistID=artists.artistID)
     GROUP BY artists.artistID, artist, imageID, itemID
     ORDER BY NEWID()
) As ArtistData

Open in new window

0
 
LVL 3

Author Comment

by:engineroom
ID: 26091091
@sameer
I got this error:
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "TmpArtist.artistID" could not be bound.

@goodangel
I got this error:
Msg 1033, Level 15, State 1, Line 9
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

0
 
LVL 41

Expert Comment

by:ralmada
ID: 26091201
Please check the aliases. This query should definitively work. If not please post the exact query you're running.

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID  
FROM Artists t1 
INNER JOIN ( 
                        select * from Images a 
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid()) 
                        ) t2 on t1.artistID=t2.artistID

Open in new window

0
 
LVL 41

Expert Comment

by:ralmada
ID: 26091268

Aside from the exact query you're running, please post some data that you're getting from the query indicating where the duplicates are.
In the mean time here's an example with AdventureWorks.

If you run
select * from Sales.SalesOrderHeader
select * from Sales.SalesOrderDetail

you will get 31465 rows in SalesOrderHeader and 121317 rows in the details one.
So if you just join them you will get 121317 rows (one to many relationship):

select * from Sales.SalesOrderHeader a
inner join Sales.SalesOrderDetail b on a.SalesOrderID = b.SalesOrderID

However if you do like this:
select * from Sales.SalesOrderHeader t1
inner join (select * from Sales.SalesOrderDetail a
			where SalesOrderDetailID = (select top 1 SalesOrderDetailID from Sales.SalesOrderDetail where SalesOrderID = a.SalesOrderID order by newid())
) t2 on t1.SalesOrderID = t2.SalesOrderID

You will get 31465 rows, which is the number of rows in SalesOrderHeader. So basically you're forcing a one-to-one join by selecting distinct values in SalesOrderDetails for each SalesOrderID.

Open in new window

0
 
LVL 3

Author Comment

by:engineroom
ID: 26092164
I ran the exact query that you gave me and it worked but gave me duplicate results.. Attached is the sql that i pasted.

Here are the results:

231      HENNING JORGENSEN      23      3
232      MIKE RUBENDALL      335      28
232      MIKE RUBENDALL      49      18
233      TREVOR MCSTAY      51      34
233      TREVOR MCSTAY      55      34
233      TREVOR MCSTAY      56      34
234      TIM LEHI      64      35
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID  
FROM Artists t1 
INNER JOIN ( 
                        select * from Images a 
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid()) 
                        ) t2 on t1.artistID=t2.artistID

Open in new window

0
 
LVL 41

Expert Comment

by:ralmada
ID: 26092268
Ok, so you have duplicates in the Artists table. See attached:
SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID   
FROM (select distinct ArtistID, Artist from Artists) t1  
INNER JOIN (  
                        select * from Images a  
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())  
                        ) t2 on t1.artistID=t2.artistID

Open in new window

0
 
LVL 3

Author Comment

by:engineroom
ID: 26092732
Still returns duplicate artists :(

er
0
 
LVL 41

Expert Comment

by:ralmada
ID: 26092767
Can you post your tables structure? And some sample data from both tables as well as the expected result. Please include samples where duplicates are found in both tables.
Also, please advise which are unique columns on both tables.
Maybe you need to join both tables with another column aside from Artist.
So far with the information I have I can only assume that you have duplicates Artist IDs with different names in the artist table, so I would try like this:

SELECT TOP 7 t1.artistID, t1.artist, t2.imageID, t2.itemID    
FROM (select ArtistID, max(Artist) Artist from Artists group by ArtistID) t1   
INNER JOIN (   
                        select * from Images a   
                        where ImageID = (select top 1 ImageID from Images where artistID = a.ArtistID order by newid())   
                        ) t2 on t1.artistID=t2.artistID

Open in new window

0
 
LVL 3

Author Comment

by:engineroom
ID: 26093042
ok.

Artists Table
-----------------------
ArtistID (unique)
Artist

Images Table
-----------------------
ImageID (unique)
artistID
itemID

Sample Data From Artists
-----------------------------------
231      HENNING JORGENSEN
232      MIKE RUBENDALL
233      TREVOR MCSTAY
234      TIM LEHI

Sample Data From Images
-----------------------------------
1      231      1
2      231      1
3      231      1
4      231      1
5      231      1
6      231      1

There can be multiple images for 1 Artist. For example, in the images table, imageID 1-43 is all from Artist 231 (HENNING JORGENSEN)
0
 
LVL 41

Expert Comment

by:ralmada
ID: 26093271
Ok, it seems that there's a bug with newid() in combination with TOP in SQL server. Check this thread for details.
http://www.sqlmag.com/Article/ArticleID/93895/sql_server_93895.html 
So I would change the query to this:
 
 

select * from artists a
inner join (	select * from (select *, row_number() over (partition by artistID order by newid()) rn from images) t1
			where rn = 1
			) b on a.artistID = b.artistID

Open in new window

0
 
LVL 41

Expert Comment

by:ralmada
ID: 26093273
oops, missed one part:
select TOP 7 a.artistID, a.artist, b.imageID, b.itemID      from artists a 
inner join (    select * from (select *, row_number() over (partition by artistID order by newid()) rn from images) t1 
                        where rn = 1 
                        ) b on a.artistID = b.artistID

Open in new window

0
 
LVL 13

Expert Comment

by:sameer2010
ID: 26095803
Try this, there was a typo.
SELECT TMP1.artistID, TMP1.artist, TMP1.imageID, C.itemID
FROM(
SELECT TMP.artistID, TMP.artist, (SELECT TOP 1 imageID FROM Images B WHERE B.artistID = Tmp.artistID) as imageID
FROM (
SELECT TOP 7 TmpArtist.artistID, TmpArtist.artist 
FROM (SELECT DISTINCT artistID, artist
FROM Artists) as TmpArtist) as TMP) as TMP1, Images C
WHERE
TMP1.artistID = C.artistID
AND TMP1.imageID = c.imageID
ORDER BY NEWID();

Open in new window

0
 
LVL 3

Author Comment

by:engineroom
ID: 26096905
Now i'm getting the same guys over and over with the Same image. When i execute the query it just puts them in a different order. i.e.:

231      HENNING JORGENSEN      1      1
237      CHRIS TREVINO      179      48
234      TIM LEHI      58      35
235      PAUL JEFFRIES      87      38
233      TREVOR MCSTAY      50      34
232      MIKE RUBENDALL      44      18
236      BRETT SCHWINDT      139      44
-----------------------------------------------
232      MIKE RUBENDALL      44      18
235      PAUL JEFFRIES      87      38
236      BRETT SCHWINDT      139      44
233      TREVOR MCSTAY      50      34
231      HENNING JORGENSEN      1      1
237      CHRIS TREVINO      179      48
234      TIM LEHI      58      35

You see that the imageID is always the same now.
0
 
LVL 41

Expert Comment

by:ralmada
ID: 26096921
Have you tried my solution?
see comment http:a#26093273 . I have recreated it in my test environment and is working properly.
0
 
LVL 41

Accepted Solution

by:
ralmada earned 2000 total points
ID: 26096943
If you also want random artist add another order by newid()
select TOP 7 a.artistID, a.artist, b.imageID, b.itemID      from artists a  
inner join (    select * from (select *, row_number() over (partition by artistID order by newid()) rn from images) t1  
                        where rn = 1  
                        ) b on a.artistID = b.artistID
order by newID()

Open in new window

0
 
LVL 3

Author Comment

by:engineroom
ID: 26097099
@ralmada:

I didn't even see this post! Give me a sec. So far it seems to work perfectly. i'll get back to you in a couple of minutes on it. Can you please explain the code to me?
0
 
LVL 41

Expert Comment

by:ralmada
ID: 26097261
Sure basically I'm using the row_number() function which will create a temporary "random" id (... order by newid()) for each image. This temporary id will start over for ImagesID belonging to a different ArtistID (...partition by ArtistID ...). So basically we are generating a random number for each image withing a partition of ArtistID.
then by just filtering it (where rn = 1) we are making sure that only one ImageID per ArtistID is returned in the subquery.
With that you can join it with the Artists table.
More info on the row_number
http://msdn.microsoft.com/en-us/library/ms186734.aspx 
 Hope this helps,
ralmada
0
 
LVL 3

Author Comment

by:engineroom
ID: 26098185
Thanks ralmada. This does seem to work. If something goes wrong i'll let  you know.
0
 
LVL 3

Author Closing Comment

by:engineroom
ID: 31668181
Although i have no idea how you did it (even though you explained it) you helped me out a lot. Thanks!!
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question