Solved

How to do JOIN without returning duplicate rows

Posted on 2006-10-31
18
2,200 Views
Last Modified: 2008-01-09
I have two tables as follows:
Table1
Columns:  CandidateID,  CandidateName
                      1                Adams
                      2                Abrahams
                      3                Barton        etc etc
Table2
Columns:  EntryDate,  Comments,   SurnameOfPersonMakingComment,  AboutCandidateID
                 1/1/2006     rubbish           Smith                                                1
                 2/1/2006     very good       Jennings                                            3
                 3/1/2006     OK                 Smith                                                1

I want to JOIN these 2 tables, to show only the last comment made about each candidate, as follows:
Columns:  CandidateID,  CandidateName,   LastComment,  EntryDate
                      1                Adams                   OK               3/1/06
                      2                Abrahams             (null)             (null)
                      3                Barton                  very good       2/1/06

How can I do this?
0
Comment
Question by:Sigh_Man
  • 9
  • 5
  • 4
18 Comments
 
LVL 35

Accepted Solution

by:
Raynard7 earned 450 total points
ID: 17841047
Hi,

You would be best if you saved the entrydate as a date and time not just the date - and you had an id for each record in table2 - this way you know what is most recent and can find the id of it

If you Did have that I would do
Select
    t3.CandidateID,
    t3.CandidateName,
    t4.EntryDate,
    t4.Comments,
    t4.SurnameOfPersonMakingComment
From
(
Select
    t1.CandidateID,
    t1.CandidateName,
    (select top 1 t2.CommentID from table2 as t2 where t2.aboutCandidateId = t1.CandidateID order by t2.EntryDate desc) as CommentID
From
    table1 as t1
) as t3 inner join table2 t4 on t3.CommentID = t4.CommentID
0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17841060
What this query would be doing is (assuming you have a CommentID for your table2) saying for each person in table1 find the most recent comment id from table2 (finding the first in descending order) and then turning this into a sub query and joining back to the candidate table to retreive all the records from that table.

Otherwise I do not know how you identify your key from table2 to say what is unique so you avoid duplicates.

Otherwise if you had to have a key being all the fields in table2 then you would need to run many sub queries as per above - then join back through, much easier to have a key field
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841082
But the ID may be misleading: it is possible for a comment not to be typed into the system in the correct date order, so I will need to sort by date and not ID
0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17841126
The query I proposed was not sorting by date id anyway - if they are an identity field you can never really trust if they will be in the correct order, however if your EntryDate is just a date and not a date and time - if you have multiple comments on the same day then I would not know which to pick.

I just want to know how you would identify the row uniquely - so you do not get duplicates.

If you had a primary key (or have one) then this would be used - which is why I suggested an ID in the first place
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841241
OK.  I guess I could do...
ORDER BY EntryDate Desc, ID Desc

In my particular circumstances I am not concerned about 2 comments on the same day - I am picking the TOP 1 in any event.  I just need help in getting that latest entry.

I'm just trying some things out based on your suggected code... be back soon
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17841254


select a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A

left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate

0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17841281
imran - the only issue with that approach - as I had considered, is that you will get duplicates if there are multiple for the same day
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17841461
this is authors comment
" I just need help in getting that latest entry."

Thats why i gave that suggestion
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841794
Raynard7,
I tried a version of your suggested code.  This seems to still be retrieving duplicates (although, the duplicate rows all display the same info!)
Here is what I got so far...

SELECT  * FROM
(
SELECT
   dtClientDetail.ClientID As ClientID,
   dtClientDetail.ClientSurname As ClientSurname,
   dtClientDetail.ClientGivenNames As ClientGivenNames,
   (SELECT TOP 1 StartDate  FROM dtHistory AS t3 WHERE t3.ClientID = dtClientDetail.ClientID ORDER BY StartDate DESC) As StartDate
FROM
dtClientDetail LEFT OUTER JOIN dtHistory ON dtClientDetail.ClientID = dtHistory.ClientID
GROUP BY dtClientDetail.ClientID, ClientSurname, ClientGivenNames, StartDate
)
As t5 ORDER BY StartDate DESC


Instead of suppressing duplicates, this results in duplicates but with all duplicate rows containing the same results.  I still need to return just ONE row per client, irrespective of the number entries for that client in the joined table.  Any ideas?
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841830
I thought I could GROUP BY a column in my derivedTbl, t5, but of course GROUP BY will not work in derived table.
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17841885
hi Sigh_Man,
Try my suggestion and see if you get the desired result.
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17842123
imran_fast,
I am struggling to see how I would get your suggested code into my SELECT statement.  Can you help me with this?
And when I JOIN tables in my SELECT statements, I need to retrieve several fields from each of those joined tables -- but I thought that using max() meant I could only retrieve data from that one (max) column.  Is this right?
Please help me with the syntax using your suggestion and my situation.  Thanks.
0
 
LVL 28

Assisted Solution

by:imran_fast
imran_fast earned 50 total points
ID: 17842156
>>but I thought that using max() meant I could only retrieve data from that one (max) column.  Is this right?
If you check my suggetion below you will find out that i am joining the same table twice one to  get max entry per client
then in the second join considering this max extry date to fetch all the other columns from the same table.

select a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A
left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17844536
OK, I'll give this a try...
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17844756
This still gives duplicates -- one entry for each line appearing in the comments table.  I need just the latest comment (and it's not caused by there being more than one comment on the same date).
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17844963
I've got it to show no duplicates, but unfortunately I need the uniqueIdentifier column, which I cannot get without adding to the GROUP BY clause.  But grouping by the UniqueIdentifier brings me back to square one because it then shows duplicates!   :-(
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17847911
Try this

select Distinct a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A
left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17848954
Guys, thank you for your efforts here.  The accepted answer was pretty much perfect - it just took me a good while to work it into my specific scenario.  I was getting ready to use the assisted answer because I was struggling to make the accepted answer work (my own fault, of course!).  imran_fast, yours was a good suggestion despite its carrying the risk of returning duplicates.  I had previously fiddled around with DISTINCT but still my code produced duplicates; maybe I was missing something - I dunno.

At the end of the day, the idea of my asking this question was to discover a way of guaranteeing no duplicates, which Raynard7's answer does.

Thanks again.  I hope the points split is OK.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Truncate vs Delete 63 88
MS SQL Backup 24 70
C# SQL BULK INSERT CLASS 5 35
SQL Server 2016 Developers Edition Install 4 57
Performance is the key factor for any successful data integration project, knowing the type of transformation that you’re using is the first step on optimizing the SSIS flow performance, by utilizing the correct transformation or the design alternat…
Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now