Castlewood
asked on
How to pick any one of duplicate records in a table
Hi Experts,
I have a Part master tab1 with unique partno which is the key, and another tab2 being loosely organized with reference info and most likely with deuplicate partno records. The two tables look like:
Tab1 (Note that the partno is the key)
Partno Field1
111
222
333
444
Tab2 (Note the partno most likely are duplicated)
Partno Refno
111 2000
222 4000
111 9000
My mission is to set relation using partno and pick the Refno from tab2. If duplicated records in Tab2, then just pick any one -- only one record from the duplicated records from Tab2.
The result table will look like:
Partno Refno
111 2000 (or 9000, doesn't matter)
222 4000
333 NULL
444 NULL
In old VFP command, I can use
USE tab1
SET RELATION TO partno INTO tab2
REPLACE tab1.field1 WITH tab2.refno
How to accomplish in SQL?
I have a Part master tab1 with unique partno which is the key, and another tab2 being loosely organized with reference info and most likely with deuplicate partno records. The two tables look like:
Tab1 (Note that the partno is the key)
Partno Field1
111
222
333
444
Tab2 (Note the partno most likely are duplicated)
Partno Refno
111 2000
222 4000
111 9000
My mission is to set relation using partno and pick the Refno from tab2. If duplicated records in Tab2, then just pick any one -- only one record from the duplicated records from Tab2.
The result table will look like:
Partno Refno
111 2000 (or 9000, doesn't matter)
222 4000
333 NULL
444 NULL
In old VFP command, I can use
USE tab1
SET RELATION TO partno INTO tab2
REPLACE tab1.field1 WITH tab2.refno
How to accomplish in SQL?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks guys. Now I know MAX() or MIN() can help in this matter. BUT I need to SELECT more fields than just Partno and Refno. For instance, I want to include one more column Field1 as below:
SELECT Tab1.PartNo, MIN(Tab2.RefNo) AS RefNo, Tab1.Field1
FROM Tab1 LEFT JOIN Tab2 ON Tab1.PartNo = Tab2.PartNo
GROUP BY Tab1.PartNo
I will get an error saying "Column Tab1.Field1 is invalid because it is not contained in either an aggregate function or the GROUP BY clause."
What can I do?
SELECT Tab1.PartNo, MIN(Tab2.RefNo) AS RefNo, Tab1.Field1
FROM Tab1 LEFT JOIN Tab2 ON Tab1.PartNo = Tab2.PartNo
GROUP BY Tab1.PartNo
I will get an error saying "Column Tab1.Field1 is invalid because it is not contained in either an aggregate function or the GROUP BY clause."
What can I do?
ASKER
I found the following best explaining that error:
http://weblogs.sqlteam.com/jeffs/archive/2007/07/20/but-why-must-that-column-be-contained-in-an-aggregate.aspx
http://weblogs.sqlteam.com/jeffs/archive/2007/07/20/but-why-must-that-column-be-contained-in-an-aggregate.aspx
ASKER
Inorder to get the Tab1's columns, a join is needed.
Inorder to keep Tab1's records intact, a LEFT JOIN is needed.
Thank you all.
Inorder to keep Tab1's records intact, a LEFT JOIN is needed.
Thank you all.
SELECT Tab1.PartNo, Tab2.RefN, Tab1.Field1
FROM Tab1
LEFT JOIN Tab2
ON Tab1.PartNo = Tab2.PartNo
where Tab2.RefN in (select MAX(RefN) from Tab2 GROUP BY PartNo)
FROM Tab1
LEFT JOIN Tab2
ON Tab1.PartNo = Tab2.PartNo
where Tab2.RefN in (select MAX(RefN) from Tab2 GROUP BY PartNo)
ASKER
knightEknight:
your answer is exactly what I look for. I should have given you all the points.
your answer is exactly what I look for. I should have given you all the points.
ASKER
knightEknight:
I overlooked one critical point -- the WHERE condition needs to be part of ON condition. Otherwise the join will be just like an INNER JOIN and partno 333 and 444 will be left out, which is not what I want. Rather I want all tab1 records to be kept. So here is it:
SELECT Tab1.PartNo, Tab2.RefN, Tab1.Field1
FROM Tab1
LEFT JOIN Tab2
ON Tab1.PartNo = Tab2.PartNo AND Tab2.RefN in (select MAX(RefN) from Tab2 GROUP BY PartNo)
I overlooked one critical point -- the WHERE condition needs to be part of ON condition. Otherwise the join will be just like an INNER JOIN and partno 333 and 444 will be left out, which is not what I want. Rather I want all tab1 records to be kept. So here is it:
SELECT Tab1.PartNo, Tab2.RefN, Tab1.Field1
FROM Tab1
LEFT JOIN Tab2
ON Tab1.PartNo = Tab2.PartNo AND Tab2.RefN in (select MAX(RefN) from Tab2 GROUP BY PartNo)
ASKER
or using derived table:
SELECT Tab1.PartNo, Tab2.RefN, Tab1.Field1
FROM Tab1
LEFT JOIN (select PartNo, MAX(RefN) as RefN from Tab2 GROUP BY PartNo) AS Tab2
WHERE Tab1.PartNo = Tab2.PartNo
SELECT Tab1.PartNo, Tab2.RefN, Tab1.Field1
FROM Tab1
LEFT JOIN (select PartNo, MAX(RefN) as RefN from Tab2 GROUP BY PartNo) AS Tab2
WHERE Tab1.PartNo = Tab2.PartNo
ASKER
Also notice the applied order is:
. the required tables are joined
. the composite dataset is filtered through the WHERE clause
. the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
. they are then filtered again, through the HAVING clause
. finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
(WHERE is applied before GROUP BY while HAVING is applied after GROUP BY)
. the required tables are joined
. the composite dataset is filtered through the WHERE clause
. the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
. they are then filtered again, through the HAVING clause
. finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
(WHERE is applied before GROUP BY while HAVING is applied after GROUP BY)
ASKER
(WHERE is applied before GROUP BY while HAVING is applied after GROUP BY)
SELECT partno, SUM(refno) AS ref
FROM tab2
GROUP BY partno HAVING SUM(refno) > 10000
returns:
111 11000
SELECT partno, SUM(refno) AS ref
FROM tab2
WHERE refno > 2000
GROUP BY partno HAVING SUM(refno) > 10000
returns: no records
(Note: HAVING can have aggregate functions while WHERE can not.)
SELECT partno, SUM(refno) AS ref
FROM tab2
GROUP BY partno HAVING SUM(refno) > 10000
returns:
111 11000
SELECT partno, SUM(refno) AS ref
FROM tab2
WHERE refno > 2000
GROUP BY partno HAVING SUM(refno) > 10000
returns: no records
(Note: HAVING can have aggregate functions while WHERE can not.)
select Partno, MAX(Refno)
into NewTable
from tab2
select * from NewTable