Link to home
Start Free TrialLog in
Avatar of bkana
bkanaFlag for United States of America

asked on

SQL Join returning duplicates and too many rows

Hi All,

I'm trying to combine data using a Join on two tables (Contributions and PA_Contributions). Both use keys (contactID). I'm selecting everything from the Contributions table but only need to use 2 columns from the other table, PA_Contribution. (tributename and tributedescription).

The problem is that the results are pulling every row from the PA_Contributions table and I only need the data (if any) from those two columns (tributename and tributedescription) based on the contactID pulled to the same row as everything else from the Contribution table.

I believe the problem is the one-to-many issue where there are multiple rows of data for a particular contactID in the 2nd table, and I only need to pull those two columns if there is data in them.

SELECT Contribution.Firstname, Contributions.Lastname, Contributions.Address1, Contributions.Address2, Contribution.City, Contribution.name, Contribution.Amount PA_Contribution.tributename, PA_Contribution.tributedescription
FROM         Contribution RIGHT OUTER JOIN
                      PA_Contribution ON Contributions.ContactId = PA_Contribution.pa_contactid
WHERE     (Contribution.createdon >= @MinDate) AND (Contribution.createdon <= @MaxDate)
ORDER BY Contributions.LastName

I have tried everything from using DISTINCT, Left and Right Outer joins.

As you can see from the screen shot, I'm getting duplicate rows because it's finding mulitple entries in the PA_Contribution table. I only need one tributedescription cloumn for each one. Keep in mind that not all will have data in the tributedescription column, but I need those that do.

Hope that made sense.
Results.jpg
Avatar of Shaun Kline
Shaun Kline
Flag of United States of America image

Based on your query and the screen capture, it appears that the "duplicate" data is coming from the Contributions table (one row has a name of Happy Hour the other has Annual Fund). A distinct would not work because the data is distinct. Joins would not affect the "duplicates". You will need to modify your query to limit your results from the one table, either by filtering out the unwanted "names" or by grabbing the record with the max/min created on date.
I'm not really sure what you're after.  Before and after data examples would help.  But, maybe this is what you are looking for?


SELECT Contribution.Firstname, Contributions.Lastname, Contributions.Address1, Contributions.Address2, Contribution.City, Contribution.name, Contribution.Amount PA_Contribution.tributename, PA_Contribution.tributedescription
FROM         Contribution RIGHT OUTER JOIN
                      PA_Contribution ON Contributions.ContactId = PA_Contribution.pa_contactid
WHERE     (Contribution.createdon >= @MinDate) AND (Contribution.createdon <= @MaxDate)
AND PA_TributeDescription IS NOT NULL
ORDER BY Contributions.LastName

Open in new window

the possibly like this....

you need to specify how you want to select only a particular tribute's details for tthe contributions...

the example below just takes the first one in alphabetical order...
SELECT Firstname, Lastname, Address1, Address2, City, name, Amount
  , tributename, tributedescription
from (
SELECT C.Firstname, C.Lastname, C.Address1, C.Address2, C.City, C.name, C.Amount
  , PA.tributename, PA.tributedescription
  ,row_number() over (partition by c.contactid order by pa.tributename) as rn
FROM         Contribution as C  
left outer join (select * from PA_Contribution 
                  where tributename is not null or pa.tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  Contribution.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

sorry
SELECT Firstname, Lastname, Address1, Address2, City, name, Amount
  , tributename, tributedescription
from (
SELECT C.Firstname, C.Lastname, C.Address1, C.Address2, C.City, C.name, C.Amount
  , PA.tributename, PA.tributedescription
  ,row_number() over (partition by c.contactid order by pa.tributename) as rn
FROM         Contribution as C  
left outer join (select * from PA_Contribution 
                  where tributename is not null or tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

Avatar of bkana

ASKER

Thanks to the three of you (so far) for the suggestions. I will try your suggestions and post back.

However, this screen shot shows what I'm after (sort of). It's just pulling the data from the one view thus far, but it has everyone's contributions, invoice number, etc. And, all the data in this view is unique, no dups. It's only when I introduce the other table/view (PA_Contributions), which has the other columns that I need, that it adds the extra rows (duplicates).

 It's adding an extra row for everyone it finds in the PA_Contribtuions table (based on contactID), creating another row and adding the tributename and tributedescription columns. I'm probably making this sound too complicated then it really is.
Good-result.jpg
you needs to ask you self these sorts of questions in these scenarios.

ok what is the primary key of the pa_contributions table?

what is it recording?

why are there multiple rows on it per contribution?
Avatar of bkana

ASKER

The pa_contributionid is the primary key for the pa_contributions view, but I don't see how that helps me- I created the aaaf_contributions view to gather all the contribution information. It just pulls from an invoice table and a contacts table. All of that information is correct and I can pull it using a date parameter. I was just trying to join my view (aaaf_contributions) with the pa_contributions view to pull in some extra information that is nowhere else in the database.

tim cs: I can not use the "IS NOT NULL" condition because it only returns rows if there is data in the tributedescription column. I need everything from the aaaf_contributions view (based on date) plus if a record is found in the pa_contributions view based on contactid, pull the tributename and tributedescription columns if they contain data.

 Everyone pulled from the first view (aaaf_contributions) isn't necessarily going to have data in the tributename and tributedescription columns from the other view (pa_contributions) , but some will, and I need to populate those columns in the results if they do.
have you tried my solution?37045789


you need to ask you self these sorts of questions in these scenarios.

what is the primary/natural key of the pa_contributions table?
what is it recording?
why are there multiple rows on it per contribution

which rows data will it be most appropriate to display
?

Avatar of bkana

ASKER

Lowfatspread: Yes, I tried your suggestion but ran into errors. Below is what I used. I had to tweak it a little because of the view names. I take it you were just using a "C" for the first view and "PA" for the second. Let me clarify something. The views are names AAAF_Contributions and FilteredPA_Contributions. I think I was confusing everyone trying to shorten everything to be simplistic.
SELECT     AAAF_Contributions.priceperunit, AAAF_Contributions.invoicenumber, AAAF_Contributions.createdon, AAAF_Contributions.pa_invoicedate, 
                      AAAF_Contributions.customeridname, AAAF_Contributions.Address1_Name, AAAF_Contributions.FirstName, AAAF_Contributions.LastName, 
                      AAAF_Contributions.Account_address, AAAF_Contributions.description, AAAF_Contributions.name, AAAF_Contributions.productidname, 
                      AAAF_Contributions.CFIMemberId, AAAF_Contributions.invoiceid, AAAF_Contributions.ContactId, AAAF_Contributions.billto_city, AAAF_Contributions.AccountNumber, 
                      AAAF_Contributions.CustomerType, AAAF_Contributions.PA_contactnumber, AAAF_Contributions.CFSDesignation, AAAF_Contributions.EMailAddress1, 
                      AAAF_Contributions.billto_telephone, AAAF_Contributions.billto_postalcode, AAAF_Contributions.billto_stateorprovince, AAAF_Contributions.billto_line2, 
                      AAAF_Contributions.billto_line1, AAAF_Contributions.billto_name, FilteredPA_Contribution.pa_tributename, FilteredPA_Contribution.pa_tributedescription
FROM (SELECT C.priceperunit, C.invoicenumber, C.createdon, C.pa_invoicedate, 
                      C.customeridname, C.Address1_Name, C.FirstName, C.LastName, 
                      C.Account_address, C.description, C.name, C.productidname, 
                      C.CFIMemberId, C.invoiceid, C.ContactId, C.billto_city, C.AccountNumber, 
                      C.CustomerType, C.PA_contactnumber, C.CFSDesignation, C.EMailAddress1, 
                      C.billto_telephone, C.billto_postalcode, C.billto_stateorprovince, C.billto_line2, 
                      C.billto_line1, C.billto_name, PA.pa_tributename, PA.pa_tributedescription
,row_number() over (partition by c.contactid order by pa.pa_tributename) as rn
FROM        AAAF_Contribution as C  
left outer join (select * from FilteredPA_Contribution 
                  where tributename is not null or tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

Q-error.jpg
yes

but you main error was in not posting the errors you received and the code you tried earlier


like this
SELECT     priceperunit, invoicenumber, createdon, pa_invoicedate, 
                      customeridname, Address1_Name, FirstName, LastName, 
                     Account_address, description, name, productidname, 
                      CFIMemberId, invoiceid, ContactId, billto_city, AccountNumber, 
                      CustomerType, PA_contactnumber, CFSDesignation, EMailAddress1, 
                      billto_telephone, billto_postalcode, billto_stateorprovince, billto_line2, 
                      billto_line1, billto_name, pa_tributename, pa_tributedescription
FROM (SELECT C.priceperunit, C.invoicenumber, C.createdon, C.pa_invoicedate, 
              C.customeridname, C.Address1_Name, C.FirstName, C.LastName, 
             C.Account_address, C.description, C.name, C.productidname, 
             C.CFIMemberId, C.invoiceid, C.ContactId, C.billto_city, C.AccountNumber, 
            C.CustomerType, C.PA_contactnumber, C.CFSDesignation, C.EMailAddress1, 
            C.billto_telephone, C.billto_postalcode, C.billto_stateorprovince, C.billto_line2, 
            C.billto_line1, C.billto_name, PA.pa_tributename, PA.pa_tributedescription
,row_number() over (partition by c.contactid order by pa.pa_tributename) as rn
FROM        AAAF_Contribution as C  
left outer join (select * from FilteredPA_Contribution 
                  where pa_tributename is not null or pa_tributedescription is not null) as PA 
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate
) as x
where rn=1
ORDER BY LastName,firstname

Open in new window

Avatar of bkana

ASKER

Receiving an error when trying to execute the date parameter:
Q2-error.jpg
so ... run it in a command window and declare/define the date ranges...

declare @mindate datetime,@maxdate datetime
select @mindate= '20110601' ,maxdate='20111031 23:59:59.997'

the select statement here...
Avatar of bkana

ASKER

Yes, I figured out the date parameter issue right after I posted last. However, your query proved to be helpful but it's still not pulling the other invoices and related information for the specified dates. Some people have more then one donation invoice and I need to pull those as well. Your query did remove all the dups and the results do reflect the tribute description (for those that had data). I'm gathering a couple of screen shots to show you. Thank you for your time with this, it has ben very helpful and informative.  
Avatar of bkana

ASKER

Look at the 2 screen shots (My_Query and Your_Query) and look at the record for Abel, Debra. She correctly pulls in yours and mine. In mine it pulls all of her invocices for 3/1/11 - 10/24/11. In yours it pulled only one. Yours has all the information I need, including any info in the tributedecsription and tributename fields, as well as no dups but I also need the other invoices from the AAAF_Contributions view for that date parameter
My-Query.jpg
Your-Query.jpg
ASKER CERTIFIED SOLUTION
Avatar of Lowfatspread
Lowfatspread
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
or like this is the relationship between contactid and invoicenumbers is more complex...


SELECT C.priceperunit, C.invoicenumber, C.createdon, C.pa_invoicedate,
              C.customeridname, C.Address1_Name, C.FirstName, C.LastName,
             C.Account_address, C.[description], C.name, C.productidname,
             C.CFIMemberId, C.invoiceid, C.ContactId, C.billto_city, C.AccountNumber,
            C.CustomerType, C.PA_contactnumber, C.CFSDesignation, C.EMailAddress1,
            C.billto_telephone, C.billto_postalcode, C.billto_stateorprovince, C.billto_line2,
            C.billto_line1, C.billto_name, PA.pa_tributename, PA.pa_tributedescription

FROM        AAAF_Contribution as C  
left outer join (SELECT * FROM (
                   (select pa_contactid,pa_tributename,pa_tributedescription
                       ,ROW_NUMBER() over (partition by pa_contactid order by pa_tributenAME) AS RN
                     from FilteredPA_Contribution
                  where pa_tributename is not null
                     or pa_tributedescription is not null) AS X
                  where rn=1
                 ) as PA
ON C.ContactId = PA.pa_contactid
WHERE  C.createdon between  @MinDate AND  @MaxDate

ORDER BY LastName,firstname
Avatar of bkana

ASKER

Lowfatspread,

My apologies for not getting back to you sooner. Your suggestion, with a little tweaking, provided me with what I needed. Thanks for all your help.