Solved

# Joining 2 tables which contains duplicated rows

Posted on 2006-05-11
775 Views
Hello,

i have 2 tables:

LEDGERTRANS:
Currency  Dimension Amount
LTL          Dim1        100
LVL          Dim4        250
EUR          Dim5        220
EUR          Dim5        220
EUR          Dim5        500
EUR          Dim2        300

VENDTRANS:
Vendor     Dimension Amount
Vend01     Dim1        100
Vend02     Dim4        250
Vend03     Dim5        220
Vend04     Dim5        220

i'm going to join these tables on dimension and amount columns.
notice, there are 2 identical rows in both tables (dimension: dim5, amount: 220)

i want my join look like this:

Currency  Dimension Amount  Vendor
LTL          Dim1        100        Vend01
LVL          Dim4        250        Vend02
EUR          Dim5        220       Vend03  (could be Vend04 here)
EUR          Dim5        220       Vend04  (could be Vend03 here)
EUR          Dim5        500       NULL
EUR          Dim2        300       NULL

this join could be very simple, but these duplicated rows complicates everything - if i do simple outer join on amount and dimension tables, i get 8 rows as a result (that's logical - these 2 duplicated rows after join becomes 4 rows)

i want avoid getting extra rows, since it's not important how duplicated rows are joined - since they're the same.

any ideas?

i was thinking of creating extra column and adding different id's for duplicated rows for both tables, then joining on these id's too, but maybe there is an easyer way?
0
Question by:tigrine_smiltpele

LVL 34

Expert Comment

SELECT LEDGERTRANS.Currency, LEDGERTRANS.Dimension, LEDGERTRANS.Amount, VENDTRANS.Vendor
FROM (SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS) AS LEDGERTRANS
INNER JOIN VENDTRANS
ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
AND LEDGERTRANS.Amount = VENDTRANS.Amount
0

Author Comment

Thank you for you query. it works for example i've given.

but, my situation is more difficult and it looks like i created too simple example. i want to keep LEDGERTRANS as main table, since in real situation, it has recid column (containing unique id for each row), so i can't do "SELECT DISTINCT Currency, Dimension Amount FROM LEDGERTRANS" - i'll loose that recid.
0

LVL 34

Expert Comment

SELECT DISTINCT LEDGERTRANS.*, VENDTRANS.Vendor
FROM LEDGERTRANS
INNER JOIN VENDTRANS
ON LEDGERTRANS.Dimension = VENDTRANS.Dimension
AND LEDGERTRANS.Amount = VENDTRANS.Amount
0

LVL 34

Expert Comment

that won't work...let me think on it
0

Author Comment

i was thinking for a while and came to a conclusion - since ledgertrans can't be grouped, vend03 and vend04 should be randomly assigned to rows 3 and 4.

i'm afraid SQL logic does not allow 'random' joins like this
0

LVL 34

Expert Comment

any other solution is going to require either a table schema change or a cursor.
0

Author Comment

i'm fine with that. maybe adding extra column 'DupesID' in both tables?
then, ledgertrans would look like this:

LEDGERTRANS:
Currency  Dimension Amount  DupesID
LTL          Dim1        100        1
LVL          Dim4        250        1
EUR          Dim5        220       1
EUR          Dim5        220       2
EUR          Dim5        500       1
EUR          Dim2        300       1

VENDTRANS:
Vendor     Dimension Amount  DupesID
Vend01     Dim1        100       1
Vend02     Dim4        250       1
Vend03     Dim5        220       1
Vend04     Dim5        220       2

DupesID column value should be based on dimension and amount columns. but how to implement that i have no idea :(

maybe solution using a cursor would be simpler?
0

LVL 34

Expert Comment

You would have to use a cursor to assign the DupesID value as well.  My original cursor idea was something like this

create a table variable (@LEDGERTRANS) with the same schema as the LEDGERTRANS table
create table variable (@VENDTRANS) with the same schema as the VENDTRANS table with an additional field bit field ("Assigned")

intiate a cursor taking each LEDGERTRANS record in order by recid
find the TOP VENDTRANS record that matches the LEDGERTRANS record Dimension/Amount where Assigned = 0
insert the current LEDGERTRANS record and the Vendor found in the previous step into @LEDGERTRANS
UPDATE @VENDTRANS record Assigned field to 1
close cursor

SELECT * FROM @LEDGERTRANS
0

LVL 34

Expert Comment

Obviously this is a very inefficient solution since cursors are EVIL but i don't see any other solution right now.
0

Author Comment

could you be more specific how to implement this using cursor? i have no experience in using cursors.
0

LVL 34

Expert Comment

Haven't forgotten about you...i'll see if i can get some code to you today
0

LVL 34

Expert Comment

It would help if you could provide the schema for the two tables (columnnames and datatypes)
0

LVL 34

Accepted Solution

Tested this on your sample data and it seems to work as specified

DECLARE @Output TABLE (
RecID integer,
Currency char(3),
Dimension char(4),
Amount decimal(9,2),
Vendor char(10)
)

DECLARE @VendTrans TABLE (
Vendor char(10),
Dimension char(4),
Amount decimal(9,2),
Assigned bit
)

INSERT INTO @VendTrans (Vendor, Dimension, Amount, Assigned)
SELECT Vendor, Dimension, Amount, 0 FROM VENDTRANS

DECLARE @RecID integer,
@Currency char(3),
@Dimension char(4),
@Amount decimal(9,2),
@Vendor char(10)

DECLARE crsLedgerTrans CURSOR FOR
SELECT RecID, Currency, Dimension, Amount
FROM LEDGERTRANS
ORDER BY RecID

OPEN crsLedgerTrans

FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount

WHILE @@FETCH_STATUS = 0
BEGIN
SELECT TOP 1 @Vendor = Vendor
FROM @VendTrans
WHERE Assigned = 0
AND Dimension = @Dimension
AND Amount = @Amount
ORDER BY Vendor

INSERT INTO @Output (RecID, Currency, Dimension, Amount, Vendor)
VALUES (@RecID, @Currency, @Dimension, @Amount, @Vendor)

UPDATE @VendTrans
SET Assigned = 1
WHERE Vendor = @Vendor
AND Dimension = @Dimension
AND Amount = @Amount

FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount
SET @Vendor = NULL
END

CLOSE crsLedgerTrans
DEALLOCATE crsLedgerTrans

SELECT * FROM @Output
0

Author Comment

Hello,

it works OK (cursors are evil indeed - it takes 6 mins to execute), but i want to do an update of LEDGERTRANS using data from @output table.
ledgertrans.VendNum = new column to store Vendor name.

UPDATE LedgerTrans SET VendNum = Vendor FROM LedgerTrans,@Output WHERE LedgerTrans.RecID = @Output.RecID

sadly, this update does not work - i get error message 'Must declare the variable '@Output'

could you help with this last bit please?
0

LVL 34

Expert Comment

put it inside the cursor

UPDATE LEDGERTRANS
SET VendNum = @Vendor
WHERE RecID = @RecID
0

Author Comment

it works fine now, but it takes 16(!) minutes to execute. without UPDATE LEDGERTRANS, it takes less than a minute. is there a way to speed things up? since i have to run this query twice (once for vendors and once for customers), it will take up to 30 mins every night!

maybe it's possible to flush @Output to static table? then, updating would be definitely faster
0

LVL 34

Expert Comment

Let's try updating LedgerTrans outside the cursor and see if that helps

...
DEALLOCATE crsLedgerTrans

UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
ON LedgerTrans.RecID = LocalLedgerTrans.RecID

SELECT * FROM @Output

Actually since you're storing the vendor in LedgerTrans you should be able to do away with the @Output table and just insert the value directly in LedgerTrans within the cursor.

DECLARE @VendTrans TABLE (
Vendor char(10),
Dimension char(4),
Amount decimal(9,2),
Assigned bit
)

INSERT INTO @VendTrans (Vendor, Dimension, Amount, Assigned)
SELECT Vendor, Dimension, Amount, 0 FROM VENDTRANS

DECLARE @RecID integer,
@Currency char(3),
@Dimension char(4),
@Amount decimal(9,2),
@Vendor char(10)

DECLARE crsLedgerTrans CURSOR FOR
SELECT RecID, Currency, Dimension, Amount
FROM LEDGERTRANS
ORDER BY RecID

OPEN crsLedgerTrans

FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount

WHILE @@FETCH_STATUS = 0
BEGIN
SELECT TOP 1 @Vendor = Vendor
FROM @VendTrans
WHERE Assigned = 0
AND Dimension = @Dimension
AND Amount = @Amount
ORDER BY Vendor

UPDATE LedgerTrans
SET VendNum = @Vendor
WHERE RecID = @RecID

UPDATE @VendTrans
SET Assigned = 1
WHERE Vendor = @Vendor
AND Dimension = @Dimension
AND Amount = @Amount

FETCH NEXT FROM crsLedgerTrans
INTO @RecID, @Currency, @Dimension, @Amount
SET @Vendor = NULL
END

CLOSE crsLedgerTrans
DEALLOCATE crsLedgerTrans

SELECT * FROM LedgerTrans
0

Author Comment

this piece of code...

UPDATE LedgerTrans
SET VendNum = Vendor
FROM LedgerTrans
INNER JOIN @Output AS LocalLedgerTrans
ON LedgerTrans.RecID = LocalLedgerTrans.RecID

...worked perfect! execution time is down to 1 minute! thank you a lot!
0

## Featured Post

I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that â€¦
Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in beâ€¦
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Acâ€¦
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.