tigrine_smiltpele
asked on
Joining 2 tables which contains duplicated rows
Hello,
i have 2 tables:
LEDGERTRANS:
Currency Dimension Amount
LTL Dim1 100
LVL Dim4 250
EUR Dim5 220
EUR Dim5 220
EUR Dim5 500
EUR Dim2 300
VENDTRANS:
Vendor Dimension Amount
Vend01 Dim1 100
Vend02 Dim4 250
Vend03 Dim5 220
Vend04 Dim5 220
i'm going to join these tables on dimension and amount columns.
notice, there are 2 identical rows in both tables (dimension: dim5, amount: 220)
i want my join look like this:
Currency Dimension Amount Vendor
LTL Dim1 100 Vend01
LVL Dim4 250 Vend02
EUR Dim5 220 Vend03 (could be Vend04 here)
EUR Dim5 220 Vend04 (could be Vend03 here)
EUR Dim5 500 NULL
EUR Dim2 300 NULL
this join could be very simple, but these duplicated rows complicates everything - if i do simple outer join on amount and dimension tables, i get 8 rows as a result (that's logical - these 2 duplicated rows after join becomes 4 rows)
i want avoid getting extra rows, since it's not important how duplicated rows are joined - since they're the same.
any ideas?
i was thinking of creating extra column and adding different id's for duplicated rows for both tables, then joining on these id's too, but maybe there is an easyer way?
i have 2 tables:
LEDGERTRANS:
Currency Dimension Amount
LTL Dim1 100
LVL Dim4 250
EUR Dim5 220
EUR Dim5 220
EUR Dim5 500
EUR Dim2 300
VENDTRANS:
Vendor Dimension Amount
Vend01 Dim1 100
Vend02 Dim4 250
Vend03 Dim5 220
Vend04 Dim5 220
i'm going to join these tables on dimension and amount columns.
notice, there are 2 identical rows in both tables (dimension: dim5, amount: 220)
i want my join look like this:
Currency Dimension Amount Vendor
LTL Dim1 100 Vend01
LVL Dim4 250 Vend02
EUR Dim5 220 Vend03 (could be Vend04 here)
EUR Dim5 220 Vend04 (could be Vend03 here)
EUR Dim5 500 NULL
EUR Dim2 300 NULL
this join could be very simple, but these duplicated rows complicates everything - if i do simple outer join on amount and dimension tables, i get 8 rows as a result (that's logical - these 2 duplicated rows after join becomes 4 rows)
i want avoid getting extra rows, since it's not important how duplicated rows are joined - since they're the same.
any ideas?
i was thinking of creating extra column and adding different id's for duplicated rows for both tables, then joining on these id's too, but maybe there is an easyer way?
ASKER
Thank you for you query. it works for example i've given.
but, my situation is more difficult and it looks like i created too simple example. i want to keep LEDGERTRANS as main table, since in real situation, it has recid column (containing unique id for each row), so i can't do "SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS" - i'll loose that recid.
but, my situation is more difficult and it looks like i created too simple example. i want to keep LEDGERTRANS as main table, since in real situation, it has recid column (containing unique id for each row), so i can't do "SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS" - i'll loose that recid.
How about...
SELECT DISTINCT LEDGERTRANS.*, VENDTRANS.Vendor
FROM LEDGERTRANS
INNER JOIN VENDTRANS
ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
AND LEDGERTRANS.Amount = VENDTRANS.Amount
SELECT DISTINCT LEDGERTRANS.*, VENDTRANS.Vendor
FROM LEDGERTRANS
INNER JOIN VENDTRANS
ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
AND LEDGERTRANS.Amount = VENDTRANS.Amount
that won't work...let me think on it
ASKER
i was thinking for a while and came to a conclusion - since ledgertrans can't be grouped, vend03 and vend04 should be randomly assigned to rows 3 and 4.
i'm afraid SQL logic does not allow 'random' joins like this
i'm afraid SQL logic does not allow 'random' joins like this
any other solution is going to require either a table schema change or a cursor.
ASKER
i'm fine with that. maybe adding extra column 'DupesID' in both tables?
then, ledgertrans would look like this:
LEDGERTRANS:
Currency Dimension Amount DupesID
LTL Dim1 100 1
LVL Dim4 250 1
EUR Dim5 220 1
EUR Dim5 220 2
EUR Dim5 500 1
EUR Dim2 300 1
VENDTRANS:
Vendor Dimension Amount DupesID
Vend01 Dim1 100 1
Vend02 Dim4 250 1
Vend03 Dim5 220 1
Vend04 Dim5 220 2
DupesID column value should be based on dimension and amount columns. but how to implement that i have no idea :(
maybe solution using a cursor would be simpler?
then, ledgertrans would look like this:
LEDGERTRANS:
Currency Dimension Amount DupesID
LTL Dim1 100 1
LVL Dim4 250 1
EUR Dim5 220 1
EUR Dim5 220 2
EUR Dim5 500 1
EUR Dim2 300 1
VENDTRANS:
Vendor Dimension Amount DupesID
Vend01 Dim1 100 1
Vend02 Dim4 250 1
Vend03 Dim5 220 1
Vend04 Dim5 220 2
DupesID column value should be based on dimension and amount columns. but how to implement that i have no idea :(
maybe solution using a cursor would be simpler?
You would have to use a cursor to assign the DupesID value as well. My original cursor idea was something like this
create a table variable (@LEDGERTRANS) with the same schema as the LEDGERTRANS table
create table variable (@VENDTRANS) with the same schema as the VENDTRANS table with an additional field bit field ("Assigned")
intiate a cursor taking each LEDGERTRANS record in order by recid
find the TOP VENDTRANS record that matches the LEDGERTRANS record Dimension/Amount where Assigned = 0
insert the current LEDGERTRANS record and the Vendor found in the previous step into @LEDGERTRANS
UPDATE @VENDTRANS record Assigned field to 1
close cursor
SELECT * FROM @LEDGERTRANS
create a table variable (@LEDGERTRANS) with the same schema as the LEDGERTRANS table
create table variable (@VENDTRANS) with the same schema as the VENDTRANS table with an additional field bit field ("Assigned")
intiate a cursor taking each LEDGERTRANS record in order by recid
find the TOP VENDTRANS record that matches the LEDGERTRANS record Dimension/Amount where Assigned = 0
insert the current LEDGERTRANS record and the Vendor found in the previous step into @LEDGERTRANS
UPDATE @VENDTRANS record Assigned field to 1
close cursor
SELECT * FROM @LEDGERTRANS
Obviously this is a very inefficient solution since cursors are EVIL but i don't see any other solution right now.
ASKER
could you be more specific how to implement this using cursor? i have no experience in using cursors.
Haven't forgotten about you...i'll see if i can get some code to you today
It would help if you could provide the schema for the two tables (columnnames and datatypes)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hello,
it works OK (cursors are evil indeed - it takes 6 mins to execute), but i want to do an update of LEDGERTRANS using data from @output table.
ledgertrans.VendNum = new column to store Vendor name.
UPDATE LedgerTrans SET VendNum = Vendor FROM LedgerTrans,@Output WHERE LedgerTrans.RecID = @Output.RecID
sadly, this update does not work - i get error message 'Must declare the variable '@Output'
could you help with this last bit please?
it works OK (cursors are evil indeed - it takes 6 mins to execute), but i want to do an update of LEDGERTRANS using data from @output table.
ledgertrans.VendNum = new column to store Vendor name.
UPDATE LedgerTrans SET VendNum = Vendor FROM LedgerTrans,@Output WHERE LedgerTrans.RecID = @Output.RecID
sadly, this update does not work - i get error message 'Must declare the variable '@Output'
could you help with this last bit please?
put it inside the cursor
UPDATE LEDGERTRANS
SET VendNum = @Vendor
WHERE RecID = @RecID
UPDATE LEDGERTRANS
SET VendNum = @Vendor
WHERE RecID = @RecID
ASKER
it works fine now, but it takes 16(!) minutes to execute. without UPDATE LEDGERTRANS, it takes less than a minute. is there a way to speed things up? since i have to run this query twice (once for vendors and once for customers), it will take up to 30 mins every night!
maybe it's possible to flush @Output to static table? then, updating would be definitely faster
maybe it's possible to flush @Output to static table? then, updating would be definitely faster
Let's try updating LedgerTrans outside the cursor and see if that helps
...
DEALLOCATE crsLedgerTrans
UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
ON LedgerTrans.RecID = LocalLedgerTrans.RecID
SELECT * FROM @Output
Actually since you're storing the vendor in LedgerTrans you should be able to do away with the @Output table and just insert the value directly in LedgerTrans within the cursor.
DECLARE @VendTrans TABLE (
Vendor char(10),
Dimension char(4),
Amount decimal(9,2),
Assigned bit
)
INSERT INTO @VendTrans (Vendor, Dimension, Amount, Assigned)
SELECT Vendor, Dimension, Amount, 0 FROM VENDTRANS
DECLARE @RecID integer,
@Currency char(3),
@Dimension char(4),
@Amount decimal(9,2),
@Vendor char(10)
DECLARE crsLedgerTrans CURSOR FOR
SELECT RecID, Currency, Dimension, Amount
FROM LEDGERTRANS
ORDER BY RecID
OPEN crsLedgerTrans
FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT TOP 1 @Vendor = Vendor
FROM @VendTrans
WHERE Assigned = 0
AND Dimension = @Dimension
AND Amount = @Amount
ORDER BY Vendor
UPDATE LedgerTrans
SET VendNum = @Vendor
WHERE RecID = @RecID
UPDATE @VendTrans
SET Assigned = 1
WHERE Vendor = @Vendor
AND Dimension = @Dimension
AND Amount = @Amount
FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount
SET @Vendor = NULL
END
CLOSE crsLedgerTrans
DEALLOCATE crsLedgerTrans
SELECT * FROM LedgerTrans
...
DEALLOCATE crsLedgerTrans
UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
ON LedgerTrans.RecID = LocalLedgerTrans.RecID
SELECT * FROM @Output
Actually since you're storing the vendor in LedgerTrans you should be able to do away with the @Output table and just insert the value directly in LedgerTrans within the cursor.
DECLARE @VendTrans TABLE (
Vendor char(10),
Dimension char(4),
Amount decimal(9,2),
Assigned bit
)
INSERT INTO @VendTrans (Vendor, Dimension, Amount, Assigned)
SELECT Vendor, Dimension, Amount, 0 FROM VENDTRANS
DECLARE @RecID integer,
@Currency char(3),
@Dimension char(4),
@Amount decimal(9,2),
@Vendor char(10)
DECLARE crsLedgerTrans CURSOR FOR
SELECT RecID, Currency, Dimension, Amount
FROM LEDGERTRANS
ORDER BY RecID
OPEN crsLedgerTrans
FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT TOP 1 @Vendor = Vendor
FROM @VendTrans
WHERE Assigned = 0
AND Dimension = @Dimension
AND Amount = @Amount
ORDER BY Vendor
UPDATE LedgerTrans
SET VendNum = @Vendor
WHERE RecID = @RecID
UPDATE @VendTrans
SET Assigned = 1
WHERE Vendor = @Vendor
AND Dimension = @Dimension
AND Amount = @Amount
FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount
SET @Vendor = NULL
END
CLOSE crsLedgerTrans
DEALLOCATE crsLedgerTrans
SELECT * FROM LedgerTrans
ASKER
this piece of code...
UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
ON LedgerTrans.RecID = LocalLedgerTrans.RecID
...worked perfect! execution time is down to 1 minute! thank you a lot!
UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
ON LedgerTrans.RecID = LocalLedgerTrans.RecID
...worked perfect! execution time is down to 1 minute! thank you a lot!
FROM (SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS) AS LEDGERTRANS
INNER JOIN VENDTRANS
ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
AND LEDGERTRANS.Amount = VENDTRANS.Amount