Solved

How to do JOIN without returning duplicate rows

Posted on 2006-10-31
18
2,201 Views
Last Modified: 2008-01-09
I have two tables as follows:
Table1
Columns:  CandidateID,  CandidateName
                      1                Adams
                      2                Abrahams
                      3                Barton        etc etc
Table2
Columns:  EntryDate,  Comments,   SurnameOfPersonMakingComment,  AboutCandidateID
                 1/1/2006     rubbish           Smith                                                1
                 2/1/2006     very good       Jennings                                            3
                 3/1/2006     OK                 Smith                                                1

I want to JOIN these 2 tables, to show only the last comment made about each candidate, as follows:
Columns:  CandidateID,  CandidateName,   LastComment,  EntryDate
                      1                Adams                   OK               3/1/06
                      2                Abrahams             (null)             (null)
                      3                Barton                  very good       2/1/06

How can I do this?
0
Comment
Question by:Sigh_Man
  • 9
  • 5
  • 4
18 Comments
 
LVL 35

Accepted Solution

by:
Raynard7 earned 450 total points
ID: 17841047
Hi,

You would be best if you saved the entrydate as a date and time not just the date - and you had an id for each record in table2 - this way you know what is most recent and can find the id of it

If you Did have that I would do
Select
    t3.CandidateID,
    t3.CandidateName,
    t4.EntryDate,
    t4.Comments,
    t4.SurnameOfPersonMakingComment
From
(
Select
    t1.CandidateID,
    t1.CandidateName,
    (select top 1 t2.CommentID from table2 as t2 where t2.aboutCandidateId = t1.CandidateID order by t2.EntryDate desc) as CommentID
From
    table1 as t1
) as t3 inner join table2 t4 on t3.CommentID = t4.CommentID
0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17841060
What this query would be doing is (assuming you have a CommentID for your table2) saying for each person in table1 find the most recent comment id from table2 (finding the first in descending order) and then turning this into a sub query and joining back to the candidate table to retreive all the records from that table.

Otherwise I do not know how you identify your key from table2 to say what is unique so you avoid duplicates.

Otherwise if you had to have a key being all the fields in table2 then you would need to run many sub queries as per above - then join back through, much easier to have a key field
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841082
But the ID may be misleading: it is possible for a comment not to be typed into the system in the correct date order, so I will need to sort by date and not ID
0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17841126
The query I proposed was not sorting by date id anyway - if they are an identity field you can never really trust if they will be in the correct order, however if your EntryDate is just a date and not a date and time - if you have multiple comments on the same day then I would not know which to pick.

I just want to know how you would identify the row uniquely - so you do not get duplicates.

If you had a primary key (or have one) then this would be used - which is why I suggested an ID in the first place
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841241
OK.  I guess I could do...
ORDER BY EntryDate Desc, ID Desc

In my particular circumstances I am not concerned about 2 comments on the same day - I am picking the TOP 1 in any event.  I just need help in getting that latest entry.

I'm just trying some things out based on your suggected code... be back soon
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17841254


select a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A

left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate

0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17841281
imran - the only issue with that approach - as I had considered, is that you will get duplicates if there are multiple for the same day
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17841461
this is authors comment
" I just need help in getting that latest entry."

Thats why i gave that suggestion
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841794
Raynard7,
I tried a version of your suggested code.  This seems to still be retrieving duplicates (although, the duplicate rows all display the same info!)
Here is what I got so far...

SELECT  * FROM
(
SELECT
   dtClientDetail.ClientID As ClientID,
   dtClientDetail.ClientSurname As ClientSurname,
   dtClientDetail.ClientGivenNames As ClientGivenNames,
   (SELECT TOP 1 StartDate  FROM dtHistory AS t3 WHERE t3.ClientID = dtClientDetail.ClientID ORDER BY StartDate DESC) As StartDate
FROM
dtClientDetail LEFT OUTER JOIN dtHistory ON dtClientDetail.ClientID = dtHistory.ClientID
GROUP BY dtClientDetail.ClientID, ClientSurname, ClientGivenNames, StartDate
)
As t5 ORDER BY StartDate DESC


Instead of suppressing duplicates, this results in duplicates but with all duplicate rows containing the same results.  I still need to return just ONE row per client, irrespective of the number entries for that client in the joined table.  Any ideas?
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 1

Author Comment

by:Sigh_Man
ID: 17841830
I thought I could GROUP BY a column in my derivedTbl, t5, but of course GROUP BY will not work in derived table.
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17841885
hi Sigh_Man,
Try my suggestion and see if you get the desired result.
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17842123
imran_fast,
I am struggling to see how I would get your suggested code into my SELECT statement.  Can you help me with this?
And when I JOIN tables in my SELECT statements, I need to retrieve several fields from each of those joined tables -- but I thought that using max() meant I could only retrieve data from that one (max) column.  Is this right?
Please help me with the syntax using your suggestion and my situation.  Thanks.
0
 
LVL 28

Assisted Solution

by:imran_fast
imran_fast earned 50 total points
ID: 17842156
>>but I thought that using max() meant I could only retrieve data from that one (max) column.  Is this right?
If you check my suggetion below you will find out that i am joining the same table twice one to  get max entry per client
then in the second join considering this max extry date to fetch all the other columns from the same table.

select a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A
left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17844536
OK, I'll give this a try...
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17844756
This still gives duplicates -- one entry for each line appearing in the comments table.  I need just the latest comment (and it's not caused by there being more than one comment on the same date).
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17844963
I've got it to show no duplicates, but unfortunately I need the uniqueIdentifier column, which I cannot get without adding to the GROUP BY clause.  But grouping by the UniqueIdentifier brings me back to square one because it then shows duplicates!   :-(
0
 
LVL 28

Expert Comment

by:imran_fast
ID: 17847911
Try this

select Distinct a.CandidateID, a.CandidateName, b.comments as lastcomment, b.entrydate
from Table1 A
left outer join (select max(engrydate)engrydate, AboutCandidateID from table2 group by AboutCandidateID) C
on A.CandidateID = c.AboutCandidateID
left outer join  Table2 B on
A.CandidateID = B.AboutCandidateID
and B.engrydate = c.engrydate
0
 
LVL 1

Author Comment

by:Sigh_Man
ID: 17848954
Guys, thank you for your efforts here.  The accepted answer was pretty much perfect - it just took me a good while to work it into my specific scenario.  I was getting ready to use the assisted answer because I was struggling to make the accepted answer work (my own fault, of course!).  imran_fast, yours was a good suggestion despite its carrying the risk of returning duplicates.  I had previously fiddled around with DISTINCT but still my code produced duplicates; maybe I was missing something - I dunno.

At the end of the day, the idea of my asking this question was to discover a way of guaranteeing no duplicates, which Raynard7's answer does.

Thanks again.  I hope the points split is OK.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL Select Statement 2 21
My Query is not giving correct result. Please help 5 30
Need a starter for ETL protocol? 4 42
SQL Field Length for Email Address 3 17
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now