asked on

SELECT DISTINCT for two columns be get all the data.

I have 5 columns, A,B,C,D, and E.

THere are possible duplicates only in column A and B. I want to create a new table that has only unique values based on A & B, but has all the columns of data. (i.e., On a given row, x, the values of A and B are not the same as any other row's A and B column).

I tried, SELECT DISTINCT A,B FROM list; but it is not an option as it only gives me columns A and B.

How do I get all the columns where columns A & B are DISTINCT?

SOLUTION

ksaul

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

DrDamnit

ASKER

Ok, let me re-phrase....I want to remove all rows where columns A and B are the same as any other row so that it will leave me with one row.

ksaul

That is the only way I know how to do it. If there are two rows where A and B are the same C, D, and E may not be the same - so you have to decide which values you want. If C, D and E happen to be the same it won't matter. The above query will return one row for every instance of A and B and the maximum value of C, D and E for each. You can use the aggregate query to selecte the rows into a new table and then delete all rows from the original and then insert them from new table.

SELECT A, B, MAX(C) AS C, MAX(D) AS D, MAX(E) AS E
INTO list_temp
FROM list
GROUP BY A, B

DELETE FROM list

INSERT INTO list
SELECT *
FROM list_temp

Backup your data first, of course.

dbeneit

Ksaul,
using yours selects:

delete list from list L left join
(SELECT A, B, MAX(C) AS C, MAX(D) AS D, MAX(E) AS E
FROM list
GROUP BY A, B) as remaind on L.a=remaind.a and L.b=remaind.b and L.c=remaind.c and L.d=remaind.d and L.e=remaind.e where remaind.a is null

delete those rows that they aren't relationated with the rows that we found

dbeneit

you can test the sql :
----------------vvvvvv---------------
/*delete list*/ select * from list L left join
(SELECT A, B, MAX(C) AS C, MAX(D) AS D, MAX(E) AS E
FROM list
GROUP BY A, B) as remaind on L.a=remaind.a and L.b=remaind.b and L.c=remaind.c and L.d=remaind.d and L.e=remaind.e where remaind.a is null

DrDamnit

ASKER

Ok, riddle me this.... Will these select statements give me the max value out of the entire column C, or just the max value of the set of "duplicates" as discarded by the distinct statement?

ASKER CERTIFIED SOLUTION

dbeneit

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

DrDamnit

ASKER

What then does

SELECT * FROM table GROUP BY A,B do then?

harfang

Hello DrDamnit,

As ksaul explained, you need to decide which values you want from each group where A and B are the same. You can use Max(), First(), etc. for single values or Sum(), Avg(), etc. to consolidate numerical values within each group, on the columns C, D, and E.

However, if you want C, D, and E to come from the same record, you need to select the first record of each group using any sort order, but you will need an ID column for that.

SELECT * FROM table
WHERE ID = (
Select Top 1 T.ID
From table T
Where T.A = table.A And T.B = table.B
Order by ID
)

This will show you the record with the lowest ID within each group. To actually get rid of the duplicates, you can change that query into a make-table query.

About your last question, you cannot use the * in a GROUP BY query. If you think about it, this would have no meaning.

Cheers!
(°v°)