[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1489
  • Last Modified:

Eliminating duplicates from a full outer join in MS SQL.

I have three tables with the same column names the only difference each table contains data from a different time frame, for example let's say I am creating three tables from with data pulled from a revenue table. In the first table I created a table with all the bills generated in the current month from the revenue table, then the second table will include all the bills that are ytd from the revenue table and the third table would include lytd. Now I have three separate tables and I need all the data from both of the tables and nulls to appear if the data is not matching. For starters I created a full outer join, on table one and two. The problem is i am getting duplicates. How do I eliminate the duplicates rows highlighted in blue in the attached example? And get the desired results
example10.xlsx
0
DAN2011
Asked:
DAN2011
  • 4
  • 4
1 Solution
 
lwadwellCommented:
They are not 'duplicates' per se ... they do differ in date.  They must exist because they matched on a join criteria ... to remove depends on the SQL.  Possibly an inline view to filter the multiple rows down to the one row you actually want to join with.  Hard to be any more precise without the SQL to refer to.
0
 
DAN2011Author Commented:
I have attached an example of the code
example-code.docx
0
 
lwadwellCommented:
I am guessing that YTD_temp is table1 in your description.  How many rows exist in that table per Account_number?  Is it only 1 as the different dates in your excel example would make me think otherwise?  If it is more than 1 ... which one do you want to keep in the output?  If only one ... which one in table2 (Current_month_temp) do you want it to join to?
I have had a guess at what you might want below.
SELECT Current_month_temp.Account_Number, Current_month_temp.master_acct, Current_month_temp.ORIGIN, Current_month_temp.PIECES, 
       Current_month_temp.WEIGHT, Current_month_temp.KILOS, Current_month_temp.REVENUE, Current_month_temp.FUEL_SURCHRG, 
       Current_month_temp.SS, Current_month_temp.HNDLG_SURCHRG, Current_month_temp.MISC_REV, Current_month_temp.Month, 
       Current_month_temp.Year, Current_month_temp.Entity, Current_month_temp.[Recap Region], YTD_temp.Account_Number AS Expr1, 
       YTD_temp.master_acct AS Expr2, YTD_temp.ORIGIN AS Expr3, YTD_temp.PIECES AS Expr4, YTD_temp.WEIGHT AS Expr5, 
       YTD_temp.KILOS AS Expr6, YTD_temp.REVENUE AS Expr7, YTD_temp.FUEL_SURCHRG AS Expr8, YTD_temp.SS AS Expr9, 
       YTD_temp.HNDLG_SURCHRG AS Expr10, YTD_temp.MISC_REV AS Expr11, YTD_temp.Month AS Expr12, YTD_temp.Year AS Expr13, 
       YTD_temp.Entity AS Expr14, YTD_temp.[Recap Region] AS Expr15, YTD_temp.Country
into Val_comb_Current_YTD
FROM  (SELECT *, row_number()over(partition by Account_Number order by year, month) rn from dbo.Current_month_temp ) Current_month_temp
FULL OUTER JOIN (select *, row_number()over(partition by Account_Number order by year, month) rn) from dbo.YTD_temp ) YTD_temp
ON Current_month_temp.Account_Number = YTD_temp.Account_Number AND Current_month_temp.rn = YTD_temp.rn

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
DAN2011Author Commented:
Current month is table one and ytd is the second table. How many rows for each account number varies because it's based on the actual transactions in a given period. But if you take a look at the spreadsheet the the cells highlighted in blue I thought would show null instead certain columns are appearing multiple times. I thought a full outer join would give me everything from one table and everything from the second joining on the accounts numbers and if a account number that are equal, and showing null if one is not in the other. Which worked In the case where an account number is found in one and not the other it gave me null. However in the case where both tables has the same matching account number but with different number of rows this issue occurs.
0
 
lwadwellCommented:
That is how any join works ... if you specify to join on a column in tableA to a column in tableB (like your account number) ... where there is a match, the rows from both tables are returned.  This is the same for an INNER, LEFT, RIGHT or FULL OUTER join.  With a LEFT, RIGHT and FULL OUTER ... extra logic is applied when a match is not found.  This site explains the differences nicely http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

>> I thought a full outer join would give me everything from one table and everything from the second joining on the accounts numbers and if a account number that are equal, and showing null if one is not in the other.
You are right ... in this case, the account numbers are equal.

From what I reading between the lines here is
  - you want all rows from tableA that are not in tableB ... even if there are multiples in tableA with the same account number
  - you want all rows from tableB that are not in tableA ... even if there are multiples in tableB with the same account number

What is unclear ... and you need to decide on ... what to do when 'x' rows in tableA have the same account number as 'y' rows in tableB.
0
 
DAN2011Author Commented:
Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.
And Thats what I thought would happen to the data thats in a9:d9 all they way down to a12:D12  would contain null instead it just duplicted a8:d8 all the way down matching how many rows are on the other side.
And just so I know, what does the row_number()over(partition by Account_Number do?
0
 
DAN2011Author Commented:
so in the case from the attached webpage lets add to it

id name       id  name
-- ----       --  ----
1  Pirate      1   Rutabaga
2  Monkey     2   Pirate
3  Ninja      3   Darth Vader
4  Spaghetti  4   Ninja
5 fox     5 fox
6.doll        6.fox

id    name       id    name
--    ----       --    ----
1     Pirate     2     Pirate
2     Monkey     null  null
3     Ninja      4     Ninja
4     Spaghetti  null  null
null  null       1     Rutabaga      
null  null       3     Darth Vader
5 foxa             5  foxa
6 doll            null null
null null      5 foxab

but what I am getting is

id    name       id    name
--    ----       --    ----
1     Pirate     2     Pirate
2     Monkey     null  null
3     Ninja      4     Ninja
4     Spaghetti  null  null
null  null       1     Rutabaga      
null  null       3     Darth Vader
5 foxa             5  foxa
5 foxa             5 foxb
6 doll            null null
0
 
lwadwellCommented:
>> And Thats what I thought would happen to the data thats in a9:d9 all they way down to a12:D12
Why did a8:d8 match to e8:h8?  Then, given that e9:e12 have the same account number, why wouldn't a8:d8 match to them too?

>> what does the row_number()over(partition by Account_Number do?
It a function in SQL that returns a sequential number (i.e. row number) per row starting at 1 for each set of Account_Number (i.e. the partition).
refer http://msdn.microsoft.com/en-us/library/ms186734(v=sql.105)

A simple example of the different joins:
IF OBJECT_ID('tempdb..#tableA') IS NOT NULL
	DROP TABLE #tableA;
CREATE TABLE #tableA (
    id    int,
    ukey  int,
    val   varchar(30)
);
insert into #tableA values (1,1,'Table A - uKey 1'),(2,3,'Table A - uKey 3');
IF OBJECT_ID('tempdb..#tableB') IS NOT NULL
	DROP TABLE #tableB;
CREATE TABLE #tableB (
    id    int,
    ukey  int,
    val   varchar(30)
);
insert into #tableB values (1,1,'Table B - uKey 1 Id 1'),(2,1,'Table B - uKey 1 Id 2'),(3,4,'Table B - uKey 4');

-- Example Joins
SELECT *
FROM #tableA ta INNER JOIN #tableB tb ON tb.ukey = ta.ukey;

SELECT *
FROM #tableA ta LEFT JOIN #tableB tb ON tb.ukey = ta.ukey;

SELECT *
FROM #tableA ta RIGHT JOIN #tableB tb ON tb.ukey = ta.ukey;

SELECT *
FROM #tableA ta FULL JOIN #tableB tb ON tb.ukey = ta.ukey;

SELECT *
FROM #tableA ta CROSS JOIN #tableB tb;

Open in new window

0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now