[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Joining 2 tables which contains duplicated rows

Posted on 2006-05-11
18
Medium Priority
?
787 Views
Last Modified: 2012-08-13
Hello,

i have 2 tables:

LEDGERTRANS:
Currency  Dimension Amount
LTL          Dim1        100
LVL          Dim4        250
EUR          Dim5        220
EUR          Dim5        220
EUR          Dim5        500
EUR          Dim2        300

VENDTRANS:
Vendor     Dimension Amount
Vend01     Dim1        100
Vend02     Dim4        250
Vend03     Dim5        220
Vend04     Dim5        220

i'm going to join these tables on dimension and amount columns.
notice, there are 2 identical rows in both tables (dimension: dim5, amount: 220)

i want my join look like this:

Currency  Dimension Amount  Vendor
LTL          Dim1        100        Vend01
LVL          Dim4        250        Vend02
EUR          Dim5        220       Vend03  (could be Vend04 here)
EUR          Dim5        220       Vend04  (could be Vend03 here)
EUR          Dim5        500       NULL
EUR          Dim2        300       NULL

this join could be very simple, but these duplicated rows complicates everything - if i do simple outer join on amount and dimension tables, i get 8 rows as a result (that's logical - these 2 duplicated rows after join becomes 4 rows)

i want avoid getting extra rows, since it's not important how duplicated rows are joined - since they're the same.

any ideas?

i was thinking of creating extra column and adding different id's for duplicated rows for both tables, then joining on these id's too, but maybe there is an easyer way?
0
Comment
Question by:tigrine_smiltpele
  • 11
  • 7
18 Comments
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16662329
SELECT LEDGERTRANS.Currency, LEDGERTRANS.Dimension, LEDGERTRANS.Amount, VENDTRANS.Vendor
FROM (SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS) AS LEDGERTRANS
INNER JOIN VENDTRANS
   ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
   AND LEDGERTRANS.Amount = VENDTRANS.Amount
0
 

Author Comment

by:tigrine_smiltpele
ID: 16662797
Thank you for you query. it works for example i've given.

but, my situation is more difficult and it looks like i created too simple example. i want to keep LEDGERTRANS as main table, since in real situation, it has recid column (containing unique id for each row), so i can't do "SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS" - i'll loose that recid.
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16662823
How about...

SELECT DISTINCT LEDGERTRANS.*, VENDTRANS.Vendor
FROM LEDGERTRANS
INNER JOIN VENDTRANS
   ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
   AND LEDGERTRANS.Amount = VENDTRANS.Amount
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16662826
that won't work...let me think on it
0
 

Author Comment

by:tigrine_smiltpele
ID: 16662840
i was thinking for a while and came to a conclusion - since ledgertrans can't be grouped, vend03 and vend04 should be randomly assigned to rows 3 and 4.

i'm afraid SQL logic does not allow 'random' joins like this
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16662844
any other solution is going to require either a table schema change or a cursor.
0
 

Author Comment

by:tigrine_smiltpele
ID: 16662895
i'm fine with that. maybe adding extra column 'DupesID' in both tables?
then, ledgertrans would look like this:

LEDGERTRANS:
Currency  Dimension Amount  DupesID
LTL          Dim1        100        1
LVL          Dim4        250        1
EUR          Dim5        220       1
EUR          Dim5        220       2
EUR          Dim5        500       1
EUR          Dim2        300       1

VENDTRANS:
Vendor     Dimension Amount  DupesID
Vend01     Dim1        100       1
Vend02     Dim4        250       1
Vend03     Dim5        220       1
Vend04     Dim5        220       2

DupesID column value should be based on dimension and amount columns. but how to implement that i have no idea :(

maybe solution using a cursor would be simpler?
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16662972
You would have to use a cursor to assign the DupesID value as well.  My original cursor idea was something like this

create a table variable (@LEDGERTRANS) with the same schema as the LEDGERTRANS table
create table variable (@VENDTRANS) with the same schema as the VENDTRANS table with an additional field bit field ("Assigned")

intiate a cursor taking each LEDGERTRANS record in order by recid
   find the TOP VENDTRANS record that matches the LEDGERTRANS record Dimension/Amount where Assigned = 0
   insert the current LEDGERTRANS record and the Vendor found in the previous step into @LEDGERTRANS
   UPDATE @VENDTRANS record Assigned field to 1
close cursor

SELECT * FROM @LEDGERTRANS
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16663091
Obviously this is a very inefficient solution since cursors are EVIL but i don't see any other solution right now.
0
 

Author Comment

by:tigrine_smiltpele
ID: 16667072
could you be more specific how to implement this using cursor? i have no experience in using cursors.
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16667857
Haven't forgotten about you...i'll see if i can get some code to you today
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16669116
It would help if you could provide the schema for the two tables (columnnames and datatypes)
0
 
LVL 34

Accepted Solution

by:
Brian Crowe earned 500 total points
ID: 16669309
Tested this on your sample data and it seems to work as specified

DECLARE @Output TABLE (
      RecID integer,
      Currency char(3),
      Dimension char(4),
      Amount decimal(9,2),
      Vendor char(10)
)

DECLARE @VendTrans TABLE (
      Vendor char(10),
      Dimension char(4),
      Amount decimal(9,2),
      Assigned bit
)

INSERT INTO @VendTrans (Vendor, Dimension, Amount, Assigned)
SELECT Vendor, Dimension, Amount, 0 FROM VENDTRANS

DECLARE @RecID integer,
      @Currency char(3),
      @Dimension char(4),
      @Amount decimal(9,2),
      @Vendor char(10)

DECLARE crsLedgerTrans CURSOR FOR
SELECT RecID, Currency, Dimension, Amount
FROM LEDGERTRANS
ORDER BY RecID

OPEN crsLedgerTrans

FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount

WHILE @@FETCH_STATUS = 0
BEGIN
      SELECT TOP 1 @Vendor = Vendor
      FROM @VendTrans
      WHERE Assigned = 0
            AND Dimension = @Dimension
            AND Amount = @Amount
      ORDER BY Vendor

      INSERT INTO @Output (RecID, Currency, Dimension, Amount, Vendor)
      VALUES (@RecID, @Currency, @Dimension, @Amount, @Vendor)

      UPDATE @VendTrans
      SET Assigned = 1
      WHERE Vendor = @Vendor
            AND Dimension = @Dimension
            AND Amount = @Amount

      FETCH NEXT FROM crsLedgerTrans
      INTO @RecID, @Currency, @Dimension, @Amount
      SET @Vendor = NULL
END

CLOSE crsLedgerTrans
DEALLOCATE crsLedgerTrans

SELECT * FROM @Output
0
 

Author Comment

by:tigrine_smiltpele
ID: 16674507
Hello,

it works OK (cursors are evil indeed - it takes 6 mins to execute), but i want to do an update of LEDGERTRANS using data from @output table.
ledgertrans.VendNum = new column to store Vendor name.

UPDATE LedgerTrans SET VendNum = Vendor FROM LedgerTrans,@Output WHERE LedgerTrans.RecID = @Output.RecID

sadly, this update does not work - i get error message 'Must declare the variable '@Output'

could you help with this last bit please?
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16674946
put it inside the cursor

UPDATE LEDGERTRANS
SET VendNum = @Vendor
WHERE RecID = @RecID
0
 

Author Comment

by:tigrine_smiltpele
ID: 16675052
it works fine now, but it takes 16(!) minutes to execute. without UPDATE LEDGERTRANS, it takes less than a minute. is there a way to speed things up? since i have to run this query twice (once for vendors and once for customers), it will take up to 30 mins every night!

maybe it's possible to flush @Output to static table? then, updating would be definitely faster
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 16675098
Let's try updating LedgerTrans outside the cursor and see if that helps


...
DEALLOCATE crsLedgerTrans

UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
   ON LedgerTrans.RecID = LocalLedgerTrans.RecID


SELECT * FROM @Output

Actually since you're storing the vendor in LedgerTrans you should be able to do away with the @Output table and just insert the value directly in LedgerTrans within the cursor.

DECLARE @VendTrans TABLE (
     Vendor char(10),
     Dimension char(4),
     Amount decimal(9,2),
     Assigned bit
)

INSERT INTO @VendTrans (Vendor, Dimension, Amount, Assigned)
SELECT Vendor, Dimension, Amount, 0 FROM VENDTRANS

DECLARE @RecID integer,
     @Currency char(3),
     @Dimension char(4),
     @Amount decimal(9,2),
     @Vendor char(10)

DECLARE crsLedgerTrans CURSOR FOR
SELECT RecID, Currency, Dimension, Amount
FROM LEDGERTRANS
ORDER BY RecID

OPEN crsLedgerTrans

FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount

WHILE @@FETCH_STATUS = 0
BEGIN
     SELECT TOP 1 @Vendor = Vendor
     FROM @VendTrans
     WHERE Assigned = 0
          AND Dimension = @Dimension
          AND Amount = @Amount
     ORDER BY Vendor
 
     UPDATE LedgerTrans
     SET VendNum = @Vendor
     WHERE RecID = @RecID

     UPDATE @VendTrans
     SET Assigned = 1
     WHERE Vendor = @Vendor
          AND Dimension = @Dimension
          AND Amount = @Amount

     FETCH NEXT FROM crsLedgerTrans
     INTO @RecID, @Currency, @Dimension, @Amount
     SET @Vendor = NULL
END

CLOSE crsLedgerTrans
DEALLOCATE crsLedgerTrans

SELECT * FROM LedgerTrans
0
 

Author Comment

by:tigrine_smiltpele
ID: 16675230
this piece of code...

UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
   ON LedgerTrans.RecID = LocalLedgerTrans.RecID

...worked perfect! execution time is down to 1 minute! thank you a lot!
0

Featured Post

Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question