andieje

asked on

# controlling output from a group by clause

Hi

If i have a table with columns A,B,C,D and run this query

select (*) from table group by A,B

I believe mysql gives random values for the columns C and D from any member of the group. Is there anyway I can control that. For example if i have

select max(C) , D from table group by A,B

can you get the value for column D anyhow from the row (or any of the rows) in the group that have the maximum value for column C?

thanks

If i have a table with columns A,B,C,D and run this query

select (*) from table group by A,B

I believe mysql gives random values for the columns C and D from any member of the group. Is there anyway I can control that. For example if i have

select max(C) , D from table group by A,B

can you get the value for column D anyhow from the row (or any of the rows) in the group that have the maximum value for column C?

thanks

ASKER

if i concatenated one of the fields in the subquery, say column E, how would i get that back in the query results?

ASKER

also this could get me multiple rows per group if multiple rows in the group had the max value. If i only wanted one value per group would i have to group the outer query as well?

Can you post some sample data on what are you trying to accomplish?

ASKER

this is made up data as i can't post real data but I'm sure you'll get the point

table

A B C D E

1 1 10 d1 e1

1 1 20 d2 e2

2 2 20 d3 e3

2 2 30 d4 e4

2 2 30 d5 e5

If we group by col A and B we get 2 groups containing rows (1,2) and rows(3,4,5)

In the first group the max value for C is 20 and only one row has this value

In the second group the max value for C is 30 and two rows have this value

For group 1 i want to retrieve [in the format column(value) ]

A(1) B(1) C(20) D(d2) E(e2) <===for cols D and E i want the values for the row with max(C) which is row 2

For the second group i only want to retrive one row and it doesnt matter which row out of row 4 or 5 the values for D or E come from as both rows have the max value for C

A(2) B(2) C(30) D(d4 or d5) E(e4 or e5)

table

A B C D E

1 1 10 d1 e1

1 1 20 d2 e2

2 2 20 d3 e3

2 2 30 d4 e4

2 2 30 d5 e5

If we group by col A and B we get 2 groups containing rows (1,2) and rows(3,4,5)

In the first group the max value for C is 20 and only one row has this value

In the second group the max value for C is 30 and two rows have this value

For group 1 i want to retrieve [in the format column(value) ]

A(1) B(1) C(20) D(d2) E(e2) <===for cols D and E i want the values for the row with max(C) which is row 2

For the second group i only want to retrive one row and it doesnt matter which row out of row 4 or 5 the values for D or E come from as both rows have the max value for C

A(2) B(2) C(30) D(d4 or d5) E(e4 or e5)

You can try like this.

`select A,B,max(C) as C,D,E from table1 group by A,B;`

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

that doesn't give me the right answer. I get less groups than i should. Isn't that because in the outer query the value for C is selected randomly from the rows in the group, so you might fail the equality test

ASKER

there are 90,000 groups when i group by A and B but that query only returns 60,000

ASKER

this is very strange:

select count(*) from table t1 where C = (select max(C) from table t2 where t1.A = t2.A and t1.B = t2.B)

The results is 70,000. How can there be less rows than the number of groups? At least one item in the group has to have the minimum value for it to be calculated?

select count(*) from table t1 where C = (select max(C) from table t2 where t1.A = t2.A and t1.B = t2.B)

The results is 70,000. How can there be less rows than the number of groups? At least one item in the group has to have the minimum value for it to be calculated?

ASKER

Please ignore the last comment as I must have done something wrong. The query in the last comment does return more rows than the number of groups but this query returns less rows than the number of groups

This query also gives less than the number of groups

Both queries return the same figure

```
select *
from table1 as t1
where C = (select max(C) from table1 as t2 where t1.A = t2.A and t1.B = t2.B)
group by A,B;
```

This query also gives less than the number of groups

```
select * from table t1 inner join
(select A, B, max(C) as C from table 1
group by A,B) as t2 on
t1.A = t2.A and
t1.B = t2.B and
t1.C = t2.C
```

Both queries return the same figure

Can you run these two queries and post the row count.

```
select count(*) as cnt
from table1 as t1
where C = (select max(C) from table1 as t2 where t1.A = t2.A and t1.B = t2.B)
group by A,B;
```

```
select count(*) as cnt
from table1 as t1
group by A,B;
```

ASKER

The first query returns one row per group (with varying counts per group) but the total number of rows is 62744. Both queries in my comment ID: 34906368 return this figure

The second query returns one row per group (with varying counts per group) but the total number of rows returned is 90428

Sorry for the delay in replying.

The second query returns one row per group (with varying counts per group) but the total number of rows returned is 90428

Sorry for the delay in replying.

If these two queries are giving you the same count, then my proposed query would be the correct one.

Query1:

Query1:

```
SELECT COUNT(* )
FROM (SELECT DISTINCT A,B
FROM table1) AS t1;
```

Query2:```
SELECT COUNT(* )
FROM ( SELECT DISTINCT A,B
FROM table1 AS t1
WHERE C = (SELECT MAX(C)
FROM table1 AS t2
WHERE t1.A = t2.A
AND t1.B = t2.B)
GROUP BY A,B) AS t1;
```

ASKER

the first query returns 90428 records as you would expect

the second query returns 62744 still

the second query returns 62744 still

ASKER

i think i have found the problem - sometimes B is null

I only want 62744 groups (when B is not null) so you were right from the beginning!

I only want 62744 groups (when B is not null) so you were right from the beginning!

ASKER

thanks

Didn't realise that you have NULLs as I do not have control on your data. Finally you got it. Glad I could help.

Open in new window