?
Solved

SQL Join returning duplicates and too many rows

Posted on 2011-10-28
17
Medium Priority
?
1,633 Views
Last Modified: 2012-05-12
Hi All,

I'm trying to combine data using a Join on two tables (Contributions and PA_Contributions). Both use keys (contactID). I'm selecting everything from the Contributions table but only need to use 2 columns from the other table, PA_Contribution. (tributename and tributedescription).

The problem is that the results are pulling every row from the PA_Contributions table and I only need the data (if any) from those two columns (tributename and tributedescription) based on the contactID pulled to the same row as everything else from the Contribution table.

I believe the problem is the one-to-many issue where there are multiple rows of data for a particular contactID in the 2nd table, and I only need to pull those two columns if there is data in them.

SELECT Contribution.Firstname, Contributions.Lastname, Contributions.Address1, Contributions.Address2, Contribution.City, Contribution.name, Contribution.Amount PA_Contribution.tributename, PA_Contribution.tributedescription
FROM         Contribution RIGHT OUTER JOIN
                      PA_Contribution ON Contributions.ContactId = PA_Contribution.pa_contactid
WHERE     (Contribution.createdon >= @MinDate) AND (Contribution.createdon <= @MaxDate)
ORDER BY Contributions.LastName

I have tried everything from using DISTINCT, Left and Right Outer joins.

As you can see from the screen shot, I'm getting duplicate rows because it's finding mulitple entries in the PA_Contribution table. I only need one tributedescription cloumn for each one. Keep in mind that not all will have data in the tributedescription column, but I need those that do.

Hope that made sense.
Results.jpg
0
Comment
Question by:bkana
17 Comments
 
LVL 27

Expert Comment

by:Shaun Kline
ID: 37045755
Based on your query and the screen capture, it appears that the "duplicate" data is coming from the Contributions table (one row has a name of Happy Hour the other has Annual Fund). A distinct would not work because the data is distinct. Joins would not affect the "duplicates". You will need to modify your query to limit your results from the one table, either by filtering out the unwanted "names" or by grabbing the record with the max/min created on date.
0
 
LVL 15

Expert Comment

by:tim_cs
ID: 37045780
I'm not really sure what you're after.  Before and after data examples would help.  But, maybe this is what you are looking for?


SELECT Contribution.Firstname, Contributions.Lastname, Contributions.Address1, Contributions.Address2, Contribution.City, Contribution.name, Contribution.Amount PA_Contribution.tributename, PA_Contribution.tributedescription
FROM         Contribution RIGHT OUTER JOIN
                      PA_Contribution ON Contributions.ContactId = PA_Contribution.pa_contactid
WHERE     (Contribution.createdon >= @MinDate) AND (Contribution.createdon <= @MaxDate)
AND PA_TributeDescription IS NOT NULL
ORDER BY Contributions.LastName

Open in new window

0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 37045781
the possibly like this....

you need to specify how you want to select only a particular tribute's details for tthe contributions...

the example below just takes the first one in alphabetical order...
SELECT Firstname, Lastname, Address1, Address2, City, name, Amount
  , tributename, tributedescription
from (
SELECT C.Firstname, C.Lastname, C.Address1, C.Address2, C.City, C.name, C.Amount
  , PA.tributename, PA.tributedescription
  ,row_number() over (partition by c.contactid order by pa.tributename) as rn
FROM         Contribution as C  
left outer join (select * from PA_Contribution 
                  where tributename is not null or pa.tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  Contribution.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

0
Free recovery tool for Microsoft Active Directory

Veeam Explorer for Microsoft Active Directory provides fast and reliable object-level recovery for Active Directory from a single-pass, agentless backup or storage snapshot — without the need to restore an entire virtual machine or use third-party tools.

 
LVL 50

Expert Comment

by:Lowfatspread
ID: 37045789
sorry
SELECT Firstname, Lastname, Address1, Address2, City, name, Amount
  , tributename, tributedescription
from (
SELECT C.Firstname, C.Lastname, C.Address1, C.Address2, C.City, C.name, C.Amount
  , PA.tributename, PA.tributedescription
  ,row_number() over (partition by c.contactid order by pa.tributename) as rn
FROM         Contribution as C  
left outer join (select * from PA_Contribution 
                  where tributename is not null or tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

0
 

Author Comment

by:bkana
ID: 37046106
Thanks to the three of you (so far) for the suggestions. I will try your suggestions and post back.

However, this screen shot shows what I'm after (sort of). It's just pulling the data from the one view thus far, but it has everyone's contributions, invoice number, etc. And, all the data in this view is unique, no dups. It's only when I introduce the other table/view (PA_Contributions), which has the other columns that I need, that it adds the extra rows (duplicates).

 It's adding an extra row for everyone it finds in the PA_Contribtuions table (based on contactID), creating another row and adding the tributename and tributedescription columns. I'm probably making this sound too complicated then it really is.
Good-result.jpg
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 37046256
you needs to ask you self these sorts of questions in these scenarios.

ok what is the primary key of the pa_contributions table?

what is it recording?

why are there multiple rows on it per contribution?
0
 

Author Comment

by:bkana
ID: 37046435
The pa_contributionid is the primary key for the pa_contributions view, but I don't see how that helps me- I created the aaaf_contributions view to gather all the contribution information. It just pulls from an invoice table and a contacts table. All of that information is correct and I can pull it using a date parameter. I was just trying to join my view (aaaf_contributions) with the pa_contributions view to pull in some extra information that is nowhere else in the database.

tim cs: I can not use the "IS NOT NULL" condition because it only returns rows if there is data in the tributedescription column. I need everything from the aaaf_contributions view (based on date) plus if a record is found in the pa_contributions view based on contactid, pull the tributename and tributedescription columns if they contain data.

 Everyone pulled from the first view (aaaf_contributions) isn't necessarily going to have data in the tributename and tributedescription columns from the other view (pa_contributions) , but some will, and I need to populate those columns in the results if they do.
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 37046521
have you tried my solution?37045789


you need to ask you self these sorts of questions in these scenarios.

what is the primary/natural key of the pa_contributions table?
what is it recording?
why are there multiple rows on it per contribution

which rows data will it be most appropriate to display
?

0
 

Author Comment

by:bkana
ID: 37046686
Lowfatspread: Yes, I tried your suggestion but ran into errors. Below is what I used. I had to tweak it a little because of the view names. I take it you were just using a "C" for the first view and "PA" for the second. Let me clarify something. The views are names AAAF_Contributions and FilteredPA_Contributions. I think I was confusing everyone trying to shorten everything to be simplistic.
SELECT     AAAF_Contributions.priceperunit, AAAF_Contributions.invoicenumber, AAAF_Contributions.createdon, AAAF_Contributions.pa_invoicedate, 
                      AAAF_Contributions.customeridname, AAAF_Contributions.Address1_Name, AAAF_Contributions.FirstName, AAAF_Contributions.LastName, 
                      AAAF_Contributions.Account_address, AAAF_Contributions.description, AAAF_Contributions.name, AAAF_Contributions.productidname, 
                      AAAF_Contributions.CFIMemberId, AAAF_Contributions.invoiceid, AAAF_Contributions.ContactId, AAAF_Contributions.billto_city, AAAF_Contributions.AccountNumber, 
                      AAAF_Contributions.CustomerType, AAAF_Contributions.PA_contactnumber, AAAF_Contributions.CFSDesignation, AAAF_Contributions.EMailAddress1, 
                      AAAF_Contributions.billto_telephone, AAAF_Contributions.billto_postalcode, AAAF_Contributions.billto_stateorprovince, AAAF_Contributions.billto_line2, 
                      AAAF_Contributions.billto_line1, AAAF_Contributions.billto_name, FilteredPA_Contribution.pa_tributename, FilteredPA_Contribution.pa_tributedescription
FROM (SELECT C.priceperunit, C.invoicenumber, C.createdon, C.pa_invoicedate, 
                      C.customeridname, C.Address1_Name, C.FirstName, C.LastName, 
                      C.Account_address, C.description, C.name, C.productidname, 
                      C.CFIMemberId, C.invoiceid, C.ContactId, C.billto_city, C.AccountNumber, 
                      C.CustomerType, C.PA_contactnumber, C.CFSDesignation, C.EMailAddress1, 
                      C.billto_telephone, C.billto_postalcode, C.billto_stateorprovince, C.billto_line2, 
                      C.billto_line1, C.billto_name, PA.pa_tributename, PA.pa_tributedescription
,row_number() over (partition by c.contactid order by pa.pa_tributename) as rn
FROM        AAAF_Contribution as C  
left outer join (select * from FilteredPA_Contribution 
                  where tributename is not null or tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

Q-error.jpg
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 37046869
yes

but you main error was in not posting the errors you received and the code you tried earlier


like this
SELECT     priceperunit, invoicenumber, createdon, pa_invoicedate, 
                      customeridname, Address1_Name, FirstName, LastName, 
                     Account_address, description, name, productidname, 
                      CFIMemberId, invoiceid, ContactId, billto_city, AccountNumber, 
                      CustomerType, PA_contactnumber, CFSDesignation, EMailAddress1, 
                      billto_telephone, billto_postalcode, billto_stateorprovince, billto_line2, 
                      billto_line1, billto_name, pa_tributename, pa_tributedescription
FROM (SELECT C.priceperunit, C.invoicenumber, C.createdon, C.pa_invoicedate, 
              C.customeridname, C.Address1_Name, C.FirstName, C.LastName, 
             C.Account_address, C.description, C.name, C.productidname, 
             C.CFIMemberId, C.invoiceid, C.ContactId, C.billto_city, C.AccountNumber, 
            C.CustomerType, C.PA_contactnumber, C.CFSDesignation, C.EMailAddress1, 
            C.billto_telephone, C.billto_postalcode, C.billto_stateorprovince, C.billto_line2, 
            C.billto_line1, C.billto_name, PA.pa_tributename, PA.pa_tributedescription
,row_number() over (partition by c.contactid order by pa.pa_tributename) as rn
FROM        AAAF_Contribution as C  
left outer join (select * from FilteredPA_Contribution 
                  where pa_tributename is not null or pa_tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

0
 

Author Comment

by:bkana
ID: 37046968
Receiving an error when trying to execute the date parameter:
Q2-error.jpg
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 37047029
so ... run it in a command window and declare/define the date ranges...

declare @mindate datetime,@maxdate datetime
select @mindate= '20110601' ,maxdate='20111031 23:59:59.997'

the select statement here...
0
 

Author Comment

by:bkana
ID: 37047191
Yes, I figured out the date parameter issue right after I posted last. However, your query proved to be helpful but it's still not pulling the other invoices and related information for the specified dates. Some people have more then one donation invoice and I need to pull those as well. Your query did remove all the dups and the results do reflect the tribute description (for those that had data). I'm gathering a couple of screen shots to show you. Thank you for your time with this, it has ben very helpful and informative.  
0
 

Author Comment

by:bkana
ID: 37047239
Look at the 2 screen shots (My_Query and Your_Query) and look at the record for Abel, Debra. She correctly pulls in yours and mine. In mine it pulls all of her invocices for 3/1/11 - 10/24/11. In yours it pulled only one. Yours has all the information I need, including any info in the tributedecsription and tributename fields, as well as no dups but I also need the other invoices from the AAAF_Contributions view for that date parameter
My-Query.jpg
Your-Query.jpg
0
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 2000 total points
ID: 37047363
try changing

,row_number() over (partition by c.contactid order by pa.pa_tributename) as rn

to

,row_number() over (partition by c.contactid,invoicenumber order by pa.pa_tributename) as rn



0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 37047410
or like this is the relationship between contactid and invoicenumbers is more complex...


SELECT C.priceperunit, C.invoicenumber, C.createdon, C.pa_invoicedate,
              C.customeridname, C.Address1_Name, C.FirstName, C.LastName,
             C.Account_address, C.[description], C.name, C.productidname,
             C.CFIMemberId, C.invoiceid, C.ContactId, C.billto_city, C.AccountNumber,
            C.CustomerType, C.PA_contactnumber, C.CFSDesignation, C.EMailAddress1,
            C.billto_telephone, C.billto_postalcode, C.billto_stateorprovince, C.billto_line2,
            C.billto_line1, C.billto_name, PA.pa_tributename, PA.pa_tributedescription

FROM        AAAF_Contribution as C  
left outer join (SELECT * FROM (
                   (select pa_contactid,pa_tributename,pa_tributedescription
                       ,ROW_NUMBER() over (partition by pa_contactid order by pa_tributenAME) AS RN
                     from FilteredPA_Contribution
                  where pa_tributename is not null
                     or pa_tributedescription is not null) AS X
                  where rn=1
                 ) as PA
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate

ORDER BY LastName,firstname
0
 

Author Comment

by:bkana
ID: 37071310
Lowfatspread,

My apologies for not getting back to you sooner. Your suggestion, with a little tweaking, provided me with what I needed. Thanks for all your help.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…
Suggested Courses

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question