Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Can this be done without a subquery?

Posted on 2014-09-22
14
Medium Priority
?
206 Views
Last Modified: 2014-09-22
I need to select the latest result from a table.

table person:
id----------name
1-----------a
2-----------b

table result:
id---------person_id-----------result--------date
1----------1-----------------------NULL---------2014-01-01
2----------1-----------------------1---------------2014-01-02
3----------2-----------------------2---------------2014-01-03

Desired outcome:
name-------------result------------date
a-------------------1-------------------2014-01-02
b-------------------2-------------------2014-01-03

Doing a simple GROUP BY does not work, obviously:

SELECT person.id, name, result, MAX(date)
FROM person LEFT JOIN result ON person.id = result.person_id
GROUP BY person.id

will result in the following wrong outcome
name-------------result------------date
a-------------------NULL-------------2014-01-02
b-------------------2-------------------2014-01-03

I can get the correct result using a subquery where I sort by date before using MAX:

SELECT id, name, result, MAX(date) FROM (
SELECT person.id, name, result, date
FROM person LEFT JOIN result ON person.id = result.person_id
ORDER BY person.id, date DESC
) AS z
GROUP BY id

How can I do this without using a subquery?

Thank you.
0
Comment
Question by:Dan Craciun
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 2
  • +1
14 Comments
 
LVL 24

Assisted Solution

by:Phillip Burton
Phillip Burton earned 668 total points
ID: 40336523
I'm assuming the end of your last query should read:

GROUP BY id, result

There are other alternatives. You could do a CTE, but that uses a subquery, or a function (which would use a subquery), or a window function such as RANK or ROW_NUMBER, but that would technically need a subquery (albeit not a big one).

I don't know why you wouldn't want to use some kind of subquery though - it is more auditable.

Your current query may have another problem, based on how the table is set up. If two "name"s  had the same "date" but different (or the same) "result"s, then you would return two rows for the same "name". How would you want to have the tie-breaker in that case?
0
 
LVL 21

Expert Comment

by:Randy Poole
ID: 40336525
When will result be null?  Do you just want to omit all records with a result of null?
0
 
LVL 35

Author Comment

by:Dan Craciun
ID: 40336545
I don't have 2 results from the same day for any person. The events are once or twice a year for a person.

Yes, I only need non-NULL results, but I can filter those out in the JOIN clause.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 21

Expert Comment

by:Randy Poole
ID: 40336575
Can you give a larger amount of test data, and also let us know why you don't want to use a sub query??
0
 
LVL 35

Author Comment

by:Dan Craciun
ID: 40336594
I try to avoid subqueries on large tables, because they tend to bog down MySQL. On the production db this query takes about 4 seconds.
I guess in this case it's OK, since I only do a GROUP BY on the outside, but I thought there might be a more straightforward way to return the values from the row with the max date.

Let's see if I can get a sample that better approximates the live data:

table result:
id---------person_id-----------result--------date
1----------1-----------------------NULL---------2014-01-01
2----------1-----------------------1---------------2014-01-02
3----------1-----------------------2---------------2014-01-03
4----------1-----------------------NULL---------2014-01-04

I would need the query to return this:
id------------name-------------result------------date
1-------------a-------------------2-------------------2014-01-03
0
 
LVL 49

Accepted Solution

by:
PortletPaul earned 1332 total points
ID: 40338150
SELECT
       p.id
     , p.name
     , MAX(r.result)
     , MAX(r.date) 
FROM person p
INNER JOIN result r ON p.id = r.person_id
WHERE r.result IS NOT NULL
GROUP BY
       p.id
     , p.name
;

Open in new window


Details
**MySQL 5.6.6 m9 Schema Setup**:

    
    
    CREATE TABLE Person
    	(`id` int, `name` varchar(1))
    ;
    	
    INSERT INTO Person
    	(`id`, `name`)
    VALUES
    	(1, 'a'),
    	(2, 'b')
    ;
    
    CREATE TABLE Result_1
    	(`id` int, `person_id` int, `result` varchar(4), `date` datetime)
    ;
    	
    INSERT INTO Result_1
    	(`id`, `person_id`, `result`, `date`)
    VALUES
    	(1, 1, NULL, '2014-01-01 00:00:00'),
    	(2, 1, '1', '2014-01-02 00:00:00'),
    	(3, 2, '2', '2014-01-03 00:00:00')
    ;
    
    CREATE TABLE Result_2
    	(`id` int, `person_id` int, `result` varchar(4), `date` datetime)
    ;
    	
    INSERT INTO Result_2
    	(`id`, `person_id`, `result`, `date`)
    VALUES
    	(1, 1, NULL, '2014-01-01 00:00:00'),
    	(2, 1, '1', '2014-01-02 00:00:00'),
    	(3, 1, '2', '2014-01-03 00:00:00'),
    	(4, 1, NULL, '2014-01-04 00:00:00')
    ;
    
    
    

**Query 1**:

    SELECT
           p.id
         , p.name
         , MAX(r.result)
         , MAX(r.date) 
    FROM person p
    INNER JOIN result_1 r ON p.id = r.person_id
    WHERE r.result IS NOT NULL
    GROUP BY
           p.id
         , p.name
    

**[Results][2]**:
    
    | ID | NAME | MAX(R.RESULT) |                    MAX(R.DATE) |
    |----|------|---------------|--------------------------------|
    |  1 |    a |             1 | January, 02 2014 00:00:00+0000 |
    |  2 |    b |             2 | January, 03 2014 00:00:00+0000 |


**Query 2**:

    SELECT
           p.id
         , p.name
         , MAX(r.result)
         , MAX(r.date) 
    FROM person p
    INNER JOIN result_2 r ON p.id = r.person_id
    WHERE r.result IS NOT NULL
    GROUP BY
           p.id
         , p.name
    

**[Results][3]**:
    
    | ID | NAME | MAX(R.RESULT) |                    MAX(R.DATE) |
    |----|------|---------------|--------------------------------|
    |  1 |    a |             2 | January, 03 2014 00:00:00+0000 |



  [1]: http://sqlfiddle.com/#!9/b766a/3

Open in new window

0
 
LVL 49

Assisted Solution

by:PortletPaul
PortletPaul earned 1332 total points
ID: 40338155
Just a word of caution about GROUP BY in MySQL

MySQL (depending on option settings) can permit this syntax:
SELECT person.id, name, result, MAX(date)
FROM person LEFT JOIN result ON person.id = result.person_id
GROUP BY person.id
Where only some non-aggregating fields are included in the group by clause (i.e. name, result) are omitted.

This is not conventional for SQL in general, and it is permitted in MySQL as an "extension"
see: http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html
You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Basically this means the MySQL group by extension could subtitute any field value it likes into the non-aggregating fields that are not contained in the group by clause.

I strongly caution against allowing this extension, and even if allowed, to specify ALL non-aggregating columns in the group by clause.

see the reference above for turning off this extension.
0
 
LVL 35

Author Comment

by:Dan Craciun
ID: 40338344
Thank you Paul. I mainly use MySQL and I got used to specify only one column in the GROUP BY.
It's easier to read (by me): oh, this query will return a single row for each id.
It hasn't got me into trouble yet, but I agree it's best practice to specify all the non-aggregating fields in the GROUP BY clause.

Returning to the query, I don't need the max result and the max date. I need the latest result.

So if I have this:
table result:
id---------person_id-----------result--------date
1----------1-----------------------NULL---------2014-01-01
2----------1-----------------------3---------------2014-01-02
3----------1-----------------------2---------------2014-01-03
4----------1-----------------------NULL---------2014-01-04

Your query will return
id------------name-------------result------------date
1-------------a-------------------3-------------------2014-01-03
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40338381
For "latest record", in MySQL, you simply cannot avoid subquery(ies) as far as I know. And you either need a costly self-join, or an ordered subquery using variable calculations. That's because MySQL lacks some really useful windowing functions like row_number().

or, you make do with the results of max() or similar.
    | ID | NAME | MAX(R.RESULT) |                    MAX(R.DATE) |
    |----|------|---------------|--------------------------------|
    |  1 |    a |             2 | January, 03 2014 00:00:00+0000 |

Open in new window


which is it to be?

--------
do please specify all the fields of a group by
what happens if someone turns off that "extension"?
(lots of wrong queries...ooops)
it is indeed best practice and one you really should follow
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40338393
sorry, you have changed the sample data yet again, and that sample/expected result no longer lines-up with the concept of "latest"

could I clarify please.

Most recent sample/expected result
table result:
id---------person_id-----------result--------date
1----------1-----------------------NULL---------2014-01-01
2----------1-----------------------3---------------2014-01-02 <<-- this is a new row or at least new result value
3----------1-----------------------2---------------2014-01-03 <<lastest record, so result = 2???
4----------1-----------------------NULL---------2014-01-04

Your query will return
id------------name-------------result------------date
1-------------a-------------------3------------------2014-01-03 <<-- maximum result = 3 date = 2014-01-03

I'm now very confused.

Could we work with just one set of data please?
0
 
LVL 35

Author Comment

by:Dan Craciun
ID: 40338416
:) sorry if I confused you.

I was only pointing out that
- if I use your second query and
- the latest result was not the biggest (so on 2014-01-03 the result -2- was smaller than an older result) then
- the query will no longer return the latest result

PS: yeah, I know I chose a poor name for that column; in the original data set it was lab_cd
0
 
LVL 35

Author Closing Comment

by:Dan Craciun
ID: 40338417
In short, the answer is : NO.

Thank you all.
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40338423
OK, got it.  You do want the "latest" - apologies; I was like a rabbit in headlights for a moment
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 40338428
@Dan

humble humble apologies.... try this
SELECT
       p.id
     , p.name
     , r.result
     , r.date
FROM person p
INNER JOIN result_1 r ON p.id = r.person_id
LEFT JOIN result_1 r2 ON r.person_id = r2.person_id
                      AND r2.id > r.id
WHERE r.result IS NOT NULL
and r2.id is null

Open in new window

It produces this outcome:
| ID | NAME | RESULT |                           DATE |
|----|------|--------|--------------------------------|
|  1 |    a |      1 | January, 02 2014 00:00:00+0000 |
|  2 |    b |      2 | January, 03 2014 00:00:00+0000 |
|  3 |  ccc |      2 |   April, 04 2014 00:00:00+0000 |
		

Open in new window

from this test data
CREATE TABLE Person
	(`id` int, `name` varchar(20))
;
	
INSERT INTO Person
	(`id`, `name`)
VALUES
	(1, 'a'),
	(2, 'b'),
	(3, 'ccc')
;

CREATE TABLE Result_1
	(`id` int, `person_id` int, `result` varchar(4), `date` datetime)
;
	
INSERT INTO Result_1
	(`id`, `person_id`, `result`, `date`)
VALUES
	(1, 1, NULL, '2014-01-01 00:00:00'),
	(2, 1, '1', '2014-01-02 00:00:00'),
	(3, 2, '2', '2014-01-03 00:00:00'),

    (31, 3, NULL, '2011-01-01 00:00:00'),
    (32, 3, '1', '2012-01-01 00:00:00'),
    (33, 3, '3', '2013-03-03 00:00:00'),
    (34, 3, '2', '2014-04-04 00:00:00')
;

Open in new window


So it does (I believe) meet your criteria of no subquery, but that self join may be "expensive" (but perhaps not as bad as a subquery)

{+edit}
this also works:
SELECT
       p.id
     , p.name
     , r.result
     , r.date
FROM person p
INNER JOIN result_1 r ON p.id = r.person_id
LEFT JOIN result_1 r2 ON r.person_id = r2.person_id
                      AND r2.`date` > r.`date`
WHERE r.result IS NOT NULL
and r2.id is null
;

Open in new window

0

Featured Post

Quiz: What Do These Organizations Have In Common?

Hint: Their teams ended up taking quizzes, too.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

664 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question