Link to home
Start Free TrialLog in
Avatar of tigrine_smiltpele
tigrine_smiltpele

asked on

Joining 2 tables which contains duplicated rows

Hello,

i have 2 tables:

LEDGERTRANS:
Currency  Dimension Amount
LTL          Dim1        100
LVL          Dim4        250
EUR          Dim5        220
EUR          Dim5        220
EUR          Dim5        500
EUR          Dim2        300

VENDTRANS:
Vendor     Dimension Amount
Vend01     Dim1        100
Vend02     Dim4        250
Vend03     Dim5        220
Vend04     Dim5        220

i'm going to join these tables on dimension and amount columns.
notice, there are 2 identical rows in both tables (dimension: dim5, amount: 220)

i want my join look like this:

Currency  Dimension Amount  Vendor
LTL          Dim1        100        Vend01
LVL          Dim4        250        Vend02
EUR          Dim5        220       Vend03  (could be Vend04 here)
EUR          Dim5        220       Vend04  (could be Vend03 here)
EUR          Dim5        500       NULL
EUR          Dim2        300       NULL

this join could be very simple, but these duplicated rows complicates everything - if i do simple outer join on amount and dimension tables, i get 8 rows as a result (that's logical - these 2 duplicated rows after join becomes 4 rows)

i want avoid getting extra rows, since it's not important how duplicated rows are joined - since they're the same.

any ideas?

i was thinking of creating extra column and adding different id's for duplicated rows for both tables, then joining on these id's too, but maybe there is an easyer way?
Avatar of Brian Crowe
Brian Crowe
Flag of United States of America image

SELECT LEDGERTRANS.Currency, LEDGERTRANS.Dimension, LEDGERTRANS.Amount, VENDTRANS.Vendor
FROM (SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS) AS LEDGERTRANS
INNER JOIN VENDTRANS
   ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
   AND LEDGERTRANS.Amount = VENDTRANS.Amount
Avatar of tigrine_smiltpele
tigrine_smiltpele

ASKER

Thank you for you query. it works for example i've given.

but, my situation is more difficult and it looks like i created too simple example. i want to keep LEDGERTRANS as main table, since in real situation, it has recid column (containing unique id for each row), so i can't do "SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS" - i'll loose that recid.
How about...

SELECT DISTINCT LEDGERTRANS.*, VENDTRANS.Vendor
FROM LEDGERTRANS
INNER JOIN VENDTRANS
   ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
   AND LEDGERTRANS.Amount = VENDTRANS.Amount
that won't work...let me think on it
i was thinking for a while and came to a conclusion - since ledgertrans can't be grouped, vend03 and vend04 should be randomly assigned to rows 3 and 4.

i'm afraid SQL logic does not allow 'random' joins like this
any other solution is going to require either a table schema change or a cursor.
i'm fine with that. maybe adding extra column 'DupesID' in both tables?
then, ledgertrans would look like this:

LEDGERTRANS:
Currency  Dimension Amount  DupesID
LTL          Dim1        100        1
LVL          Dim4        250        1
EUR          Dim5        220       1
EUR          Dim5        220       2
EUR          Dim5        500       1
EUR          Dim2        300       1

VENDTRANS:
Vendor     Dimension Amount  DupesID
Vend01     Dim1        100       1
Vend02     Dim4        250       1
Vend03     Dim5        220       1
Vend04     Dim5        220       2

DupesID column value should be based on dimension and amount columns. but how to implement that i have no idea :(

maybe solution using a cursor would be simpler?
You would have to use a cursor to assign the DupesID value as well.  My original cursor idea was something like this

create a table variable (@LEDGERTRANS) with the same schema as the LEDGERTRANS table
create table variable (@VENDTRANS) with the same schema as the VENDTRANS table with an additional field bit field ("Assigned")

intiate a cursor taking each LEDGERTRANS record in order by recid
   find the TOP VENDTRANS record that matches the LEDGERTRANS record Dimension/Amount where Assigned = 0
   insert the current LEDGERTRANS record and the Vendor found in the previous step into @LEDGERTRANS
   UPDATE @VENDTRANS record Assigned field to 1
close cursor

SELECT * FROM @LEDGERTRANS
Obviously this is a very inefficient solution since cursors are EVIL but i don't see any other solution right now.
could you be more specific how to implement this using cursor? i have no experience in using cursors.
Haven't forgotten about you...i'll see if i can get some code to you today
It would help if you could provide the schema for the two tables (columnnames and datatypes)
ASKER CERTIFIED SOLUTION
Avatar of Brian Crowe
Brian Crowe
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hello,

it works OK (cursors are evil indeed - it takes 6 mins to execute), but i want to do an update of LEDGERTRANS using data from @output table.
ledgertrans.VendNum = new column to store Vendor name.

UPDATE LedgerTrans SET VendNum = Vendor FROM LedgerTrans,@Output WHERE LedgerTrans.RecID = @Output.RecID

sadly, this update does not work - i get error message 'Must declare the variable '@Output'

could you help with this last bit please?
put it inside the cursor

UPDATE LEDGERTRANS
SET VendNum = @Vendor
WHERE RecID = @RecID
it works fine now, but it takes 16(!) minutes to execute. without UPDATE LEDGERTRANS, it takes less than a minute. is there a way to speed things up? since i have to run this query twice (once for vendors and once for customers), it will take up to 30 mins every night!

maybe it's possible to flush @Output to static table? then, updating would be definitely faster
Let's try updating LedgerTrans outside the cursor and see if that helps


...
DEALLOCATE crsLedgerTrans

UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
   ON LedgerTrans.RecID = LocalLedgerTrans.RecID


SELECT * FROM @Output

Actually since you're storing the vendor in LedgerTrans you should be able to do away with the @Output table and just insert the value directly in LedgerTrans within the cursor.

DECLARE @VendTrans TABLE (
     Vendor char(10),
     Dimension char(4),
     Amount decimal(9,2),
     Assigned bit
)

INSERT INTO @VendTrans (Vendor, Dimension, Amount, Assigned)
SELECT Vendor, Dimension, Amount, 0 FROM VENDTRANS

DECLARE @RecID integer,
     @Currency char(3),
     @Dimension char(4),
     @Amount decimal(9,2),
     @Vendor char(10)

DECLARE crsLedgerTrans CURSOR FOR
SELECT RecID, Currency, Dimension, Amount
FROM LEDGERTRANS
ORDER BY RecID

OPEN crsLedgerTrans

FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount

WHILE @@FETCH_STATUS = 0
BEGIN
     SELECT TOP 1 @Vendor = Vendor
     FROM @VendTrans
     WHERE Assigned = 0
          AND Dimension = @Dimension
          AND Amount = @Amount
     ORDER BY Vendor
 
     UPDATE LedgerTrans
     SET VendNum = @Vendor
     WHERE RecID = @RecID

     UPDATE @VendTrans
     SET Assigned = 1
     WHERE Vendor = @Vendor
          AND Dimension = @Dimension
          AND Amount = @Amount

     FETCH NEXT FROM crsLedgerTrans
     INTO @RecID, @Currency, @Dimension, @Amount
     SET @Vendor = NULL
END

CLOSE crsLedgerTrans
DEALLOCATE crsLedgerTrans

SELECT * FROM LedgerTrans
this piece of code...

UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
   ON LedgerTrans.RecID = LocalLedgerTrans.RecID

...worked perfect! execution time is down to 1 minute! thank you a lot!