Dan Craciun
asked on
Can this be done without a subquery?
I need to select the latest result from a table.
table person:
id----------name
1-----------a
2-----------b
table result:
id---------person_id------ -----resul t--------d ate
1----------1-------------- ---------N ULL------- --2014-01- 01
2----------1-------------- ---------1 ---------- -----2014- 01-02
3----------2-------------- ---------2 ---------- -----2014- 01-03
Desired outcome:
name-------------result--- ---------d ate
a-------------------1----- ---------- ----2014-0 1-02
b-------------------2----- ---------- ----2014-0 1-03
Doing a simple GROUP BY does not work, obviously:
SELECT person.id, name, result, MAX(date)
FROM person LEFT JOIN result ON person.id = result.person_id
GROUP BY person.id
will result in the following wrong outcome
name-------------result--- ---------d ate
a-------------------NULL-- ---------- -2014-01-0 2
b-------------------2----- ---------- ----2014-0 1-03
I can get the correct result using a subquery where I sort by date before using MAX:
SELECT id, name, result, MAX(date) FROM (
SELECT person.id, name, result, date
FROM person LEFT JOIN result ON person.id = result.person_id
ORDER BY person.id, date DESC
) AS z
GROUP BY id
How can I do this without using a subquery?
Thank you.
table person:
id----------name
1-----------a
2-----------b
table result:
id---------person_id------
1----------1--------------
2----------1--------------
3----------2--------------
Desired outcome:
name-------------result---
a-------------------1-----
b-------------------2-----
Doing a simple GROUP BY does not work, obviously:
SELECT person.id, name, result, MAX(date)
FROM person LEFT JOIN result ON person.id = result.person_id
GROUP BY person.id
will result in the following wrong outcome
name-------------result---
a-------------------NULL--
b-------------------2-----
I can get the correct result using a subquery where I sort by date before using MAX:
SELECT id, name, result, MAX(date) FROM (
SELECT person.id, name, result, date
FROM person LEFT JOIN result ON person.id = result.person_id
ORDER BY person.id, date DESC
) AS z
GROUP BY id
How can I do this without using a subquery?
Thank you.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
When will result be null? Do you just want to omit all records with a result of null?
ASKER
I don't have 2 results from the same day for any person. The events are once or twice a year for a person.
Yes, I only need non-NULL results, but I can filter those out in the JOIN clause.
Yes, I only need non-NULL results, but I can filter those out in the JOIN clause.
Can you give a larger amount of test data, and also let us know why you don't want to use a sub query??
ASKER
I try to avoid subqueries on large tables, because they tend to bog down MySQL. On the production db this query takes about 4 seconds.
I guess in this case it's OK, since I only do a GROUP BY on the outside, but I thought there might be a more straightforward way to return the values from the row with the max date.
Let's see if I can get a sample that better approximates the live data:
table result:
id---------person_id------ -----resul t--------d ate
1----------1-------------- ---------N ULL------- --2014-01- 01
2----------1-------------- ---------1 ---------- -----2014- 01-02
3----------1-------------- ---------2 ---------- -----2014- 01-03
4----------1-------------- ---------N ULL------- --2014-01- 04
I would need the query to return this:
id------------name-------- -----resul t--------- ---date
1-------------a----------- --------2- ---------- --------20 14-01-03
I guess in this case it's OK, since I only do a GROUP BY on the outside, but I thought there might be a more straightforward way to return the values from the row with the max date.
Let's see if I can get a sample that better approximates the live data:
table result:
id---------person_id------
1----------1--------------
2----------1--------------
3----------1--------------
4----------1--------------
I would need the query to return this:
id------------name--------
1-------------a-----------
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you Paul. I mainly use MySQL and I got used to specify only one column in the GROUP BY.
It's easier to read (by me): oh, this query will return a single row for each id.
It hasn't got me into trouble yet, but I agree it's best practice to specify all the non-aggregating fields in the GROUP BY clause.
Returning to the query, I don't need the max result and the max date. I need the latest result.
So if I have this:
table result:
id---------person_id------ -----resul t--------d ate
1----------1-------------- ---------N ULL------- --2014-01- 01
2----------1-------------- ---------3 ---------- -----2014- 01-02
3----------1-------------- ---------2 ---------- -----2014- 01-03
4----------1-------------- ---------N ULL------- --2014-01- 04
Your query will return
id------------name-------- -----resul t--------- ---date
1-------------a----------- --------3- ---------- --------20 14-01-03
It's easier to read (by me): oh, this query will return a single row for each id.
It hasn't got me into trouble yet, but I agree it's best practice to specify all the non-aggregating fields in the GROUP BY clause.
Returning to the query, I don't need the max result and the max date. I need the latest result.
So if I have this:
table result:
id---------person_id------
1----------1--------------
2----------1--------------
3----------1--------------
4----------1--------------
Your query will return
id------------name--------
1-------------a-----------
For "latest record", in MySQL, you simply cannot avoid subquery(ies) as far as I know. And you either need a costly self-join, or an ordered subquery using variable calculations. That's because MySQL lacks some really useful windowing functions like row_number().
or, you make do with the results of max() or similar.
which is it to be?
--------
do please specify all the fields of a group by
what happens if someone turns off that "extension"?
(lots of wrong queries...ooops)
it is indeed best practice and one you really should follow
or, you make do with the results of max() or similar.
| ID | NAME | MAX(R.RESULT) | MAX(R.DATE) |
|----|------|---------------|--------------------------------|
| 1 | a | 2 | January, 03 2014 00:00:00+0000 |
which is it to be?
--------
do please specify all the fields of a group by
what happens if someone turns off that "extension"?
(lots of wrong queries...ooops)
it is indeed best practice and one you really should follow
sorry, you have changed the sample data yet again, and that sample/expected result no longer lines-up with the concept of "latest"
could I clarify please.
Most recent sample/expected result
table result:
id---------person_id------ -----resul t--------d ate
1----------1-------------- ---------N ULL------- --2014-01- 01
2----------1-------------- ---------3---------------2014-01-02 <<-- this is a new row or at least new result value
3----------1-------------- ---------2 ---------- -----2014-01-03 <<lastest record, so result = 2???
4----------1-------------- ---------N ULL------- --2014-01- 04
Your query will return
id------------name-------- -----resul t--------- ---date
1-------------a----------- --------3------------------2014-01-03 <<-- maximum result = 3 date = 2014-01-03
I'm now very confused.
Could we work with just one set of data please?
could I clarify please.
Most recent sample/expected result
table result:
id---------person_id------
1----------1--------------
2----------1--------------
3----------1--------------
4----------1--------------
Your query will return
id------------name--------
1-------------a-----------
I'm now very confused.
Could we work with just one set of data please?
ASKER
:) sorry if I confused you.
I was only pointing out that
- if I use your second query and
- the latest result was not the biggest (so on 2014-01-03 the result -2- was smaller than an older result) then
- the query will no longer return the latest result
PS: yeah, I know I chose a poor name for that column; in the original data set it was lab_cd
I was only pointing out that
- if I use your second query and
- the latest result was not the biggest (so on 2014-01-03 the result -2- was smaller than an older result) then
- the query will no longer return the latest result
PS: yeah, I know I chose a poor name for that column; in the original data set it was lab_cd
ASKER
In short, the answer is : NO.
Thank you all.
Thank you all.
OK, got it. You do want the "latest" - apologies; I was like a rabbit in headlights for a moment
@Dan
humble humble apologies.... try this
So it does (I believe) meet your criteria of no subquery, but that self join may be "expensive" (but perhaps not as bad as a subquery)
{+edit}
this also works:
humble humble apologies.... try this
SELECT
p.id
, p.name
, r.result
, r.date
FROM person p
INNER JOIN result_1 r ON p.id = r.person_id
LEFT JOIN result_1 r2 ON r.person_id = r2.person_id
AND r2.id > r.id
WHERE r.result IS NOT NULL
and r2.id is null
It produces this outcome:
| ID | NAME | RESULT | DATE |
|----|------|--------|--------------------------------|
| 1 | a | 1 | January, 02 2014 00:00:00+0000 |
| 2 | b | 2 | January, 03 2014 00:00:00+0000 |
| 3 | ccc | 2 | April, 04 2014 00:00:00+0000 |
from this test data
CREATE TABLE Person
(`id` int, `name` varchar(20))
;
INSERT INTO Person
(`id`, `name`)
VALUES
(1, 'a'),
(2, 'b'),
(3, 'ccc')
;
CREATE TABLE Result_1
(`id` int, `person_id` int, `result` varchar(4), `date` datetime)
;
INSERT INTO Result_1
(`id`, `person_id`, `result`, `date`)
VALUES
(1, 1, NULL, '2014-01-01 00:00:00'),
(2, 1, '1', '2014-01-02 00:00:00'),
(3, 2, '2', '2014-01-03 00:00:00'),
(31, 3, NULL, '2011-01-01 00:00:00'),
(32, 3, '1', '2012-01-01 00:00:00'),
(33, 3, '3', '2013-03-03 00:00:00'),
(34, 3, '2', '2014-04-04 00:00:00')
;
So it does (I believe) meet your criteria of no subquery, but that self join may be "expensive" (but perhaps not as bad as a subquery)
{+edit}
this also works:
SELECT
p.id
, p.name
, r.result
, r.date
FROM person p
INNER JOIN result_1 r ON p.id = r.person_id
LEFT JOIN result_1 r2 ON r.person_id = r2.person_id
AND r2.`date` > r.`date`
WHERE r.result IS NOT NULL
and r2.id is null
;