• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2224
  • Last Modified:

How to do JOIN without returning duplicate rows

I have two tables as follows:
Table1
Columns:  CandidateID,  CandidateName
                      1                Adams
                      2                Abrahams
                      3                Barton        etc etc
Table2
Columns:  EntryDate,  Comments,   SurnameOfPersonMakingComment,  AboutCandidateID
                 1/1/2006     rubbish           Smith                                                1
                 2/1/2006     very good       Jennings                                            3
                 3/1/2006     OK                 Smith                                                1

I want to JOIN these 2 tables, to show only the last comment made about each candidate, as follows:
Columns:  CandidateID,  CandidateName,   LastComment,  EntryDate
                      1                Adams                   OK               3/1/06
                      2                Abrahams             (null)             (null)
                      3                Barton                  very good       2/1/06

How can I do this?
0
Sigh_Man
Asked:
Sigh_Man
  • 9
  • 5
  • 4
2 Solutions
 
Raynard7Commented:
Hi,

You would be best if you saved the entrydate as a date and time not just the date - and you had an id for each record in table2 - this way you know what is most recent and can find the id of it

If you Did have that I would do
Select
    t3.CandidateID,
    t3.CandidateName,
    t4.EntryDate,
    t4.Comments,
    t4.SurnameOfPersonMakingComment
From
(
Select
    t1.CandidateID,
    t1.CandidateName,
    (select top 1 t2.CommentID from table2 as t2 where t2.aboutCandidateId = t1.CandidateID order by t2.EntryDate desc) as CommentID
From
    table1 as t1
) as t3 inner join table2 t4 on t3.CommentID = t4.CommentID
0
 
Raynard7Commented:
What this query would be doing is (assuming you have a CommentID for your table2) saying for each person in table1 find the most recent comment id from table2 (finding the first in descending order) and then turning this into a sub query and joining back to the candidate table to retreive all the records from that table.

Otherwise I do not know how you identify your key from table2 to say what is unique so you avoid duplicates.

Otherwise if you had to have a key being all the fields in table2 then you would need to run many sub queries as per above - then join back through, much easier to have a key field
0
 
Sigh_ManAuthor Commented:
But the ID may be misleading: it is possible for a comment not to be typed into the system in the correct date order, so I will need to sort by date and not ID
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Raynard7Commented:
The query I proposed was not sorting by date id anyway - if they are an identity field you can never really trust if they will be in the correct order, however if your EntryDate is just a date and not a date and time - if you have multiple comments on the same day then I would not know which to pick.

I just want to know how you would identify the row uniquely - so you do not get duplicates.

If you had a primary key (or have one) then this would be used - which is why I suggested an ID in the first place
0
 
Sigh_ManAuthor Commented:
OK.  I guess I could do...
ORDER BY EntryDate Desc, ID Desc

In my particular circumstances I am not concerned about 2 comments on the same day - I am picking the TOP 1 in any event.  I just need help in getting that latest entry.

I'm just trying some things out based on your suggected code... be back soon
0
 
imran_fastCommented:


select a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A

left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate

0
 
Raynard7Commented:
imran - the only issue with that approach - as I had considered, is that you will get duplicates if there are multiple for the same day
0
 
imran_fastCommented:
this is authors comment
" I just need help in getting that latest entry."

Thats why i gave that suggestion
0
 
Sigh_ManAuthor Commented:
Raynard7,
I tried a version of your suggested code.  This seems to still be retrieving duplicates (although, the duplicate rows all display the same info!)
Here is what I got so far...

SELECT  * FROM
(
SELECT
   dtClientDetail.ClientID As ClientID,
   dtClientDetail.ClientSurname As ClientSurname,
   dtClientDetail.ClientGivenNames As ClientGivenNames,
   (SELECT TOP 1 StartDate  FROM dtHistory AS t3 WHERE t3.ClientID = dtClientDetail.ClientID ORDER BY StartDate DESC) As StartDate
FROM
dtClientDetail LEFT OUTER JOIN dtHistory ON dtClientDetail.ClientID = dtHistory.ClientID
GROUP BY dtClientDetail.ClientID, ClientSurname, ClientGivenNames, StartDate
)
As t5 ORDER BY StartDate DESC


Instead of suppressing duplicates, this results in duplicates but with all duplicate rows containing the same results.  I still need to return just ONE row per client, irrespective of the number entries for that client in the joined table.  Any ideas?
0
 
Sigh_ManAuthor Commented:
I thought I could GROUP BY a column in my derivedTbl, t5, but of course GROUP BY will not work in derived table.
0
 
imran_fastCommented:
hi Sigh_Man,
Try my suggestion and see if you get the desired result.
0
 
Sigh_ManAuthor Commented:
imran_fast,
I am struggling to see how I would get your suggested code into my SELECT statement.  Can you help me with this?
And when I JOIN tables in my SELECT statements, I need to retrieve several fields from each of those joined tables -- but I thought that using max() meant I could only retrieve data from that one (max) column.  Is this right?
Please help me with the syntax using your suggestion and my situation.  Thanks.
0
 
imran_fastCommented:
>>but I thought that using max() meant I could only retrieve data from that one (max) column.  Is this right?
If you check my suggetion below you will find out that i am joining the same table twice one to  get max entry per client
then in the second join considering this max extry date to fetch all the other columns from the same table.

select a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A
left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate
0
 
Sigh_ManAuthor Commented:
OK, I'll give this a try...
0
 
Sigh_ManAuthor Commented:
This still gives duplicates -- one entry for each line appearing in the comments table.  I need just the latest comment (and it's not caused by there being more than one comment on the same date).
0
 
Sigh_ManAuthor Commented:
I've got it to show no duplicates, but unfortunately I need the uniqueIdentifier column, which I cannot get without adding to the GROUP BY clause.  But grouping by the UniqueIdentifier brings me back to square one because it then shows duplicates!   :-(
0
 
imran_fastCommented:
Try this

select Distinct a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A
left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate
0
 
Sigh_ManAuthor Commented:
Guys, thank you for your efforts here.  The accepted answer was pretty much perfect - it just took me a good while to work it into my specific scenario.  I was getting ready to use the assisted answer because I was struggling to make the accepted answer work (my own fault, of course!).  imran_fast, yours was a good suggestion despite its carrying the risk of returning duplicates.  I had previously fiddled around with DISTINCT but still my code produced duplicates; maybe I was missing something - I dunno.

At the end of the day, the idea of my asking this question was to discover a way of guaranteeing no duplicates, which Raynard7's answer does.

Thanks again.  I hope the points split is OK.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 9
  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now