# How to select max/min/first/last by group. Aka "Can my program become a short query"?

I have a program that "selects * from table" and the processes this data, like this:

1) Group each entry by their datetime column, ignoring second precision, order the group by an index value (column i)
2) Find the largest, smallest, first, and last value of column x, in each group.
3) Sum all the values of column y in each group
4) Loop over all data, the write to disc.

I have already written the program, so the task is done, but I think the whole think looks like it could be done easily directly in sql. If this can be done in a "short enough" query, then I'd have fewer bugs to worry about in my own code.

Mainly as an exercise to myself in sql, I'm wondering just how easily this can be done in sql.

Selecting the max(x) and min(x) values of each group is probably quite simple, and I'm hoping the first/last problem could be solved by using max(i)/min(i), but I'm not very familiar with sql, and just randomly guessing seems like a big waste of time.

Again, my question is really "how short" this can be, since if the query requires a lot lines, I'm actually worse of, since I would then need to debug in an unfamiliar language.
LVL 6
###### Who is Participating?

Commented:
ok try this

SELECT THEDATE,MINX,MAXX,TOTY,Z.X AS FIRSTX,Z1.X AS LASTX
FROM (

select thedate,min(x) as minx,max(x) as maxx,sum(y) as toty,min(I) AS MINI,MAX(I) AS MAXI
FROM (
select date_format(datecol,'%Y%b%d %H%i') as thedate, x, y,i
from yourtable
) AS X
GROUP BY THEDATE
) AS y

iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z

ON y.THEDATE=Z.THEDATE
AND MINI=Z.I

iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z1
ON y.THEDATE=Z1.THEDATE
AND MAXI=Z1.I

ORDER BY 1

;-)
0

Software ArchitectCommented:
Well, if you are unfamiliar with SQL then it might be a good idea to leave it as is. However, if you are familiar with MySQL stored procedures, that would be the way to go. You could create a stored procedure that obtains the result set and saves it to a disk file.
0

Author Commented:
>if you are unfamiliar with SQL then it might be a good idea to leave it as is
Good point. But regardless of whether I bring such a query into a production environment or not, it would still have been interesting (and hopefully effective) to learn SQL by having "mapped" a query to my code.
0

Commented:
the main reason for doing this in the sql is to reduce the network traffic/overhead of your calling application
having to retrieve each row from the database, which could be considerable

SOMETHING LIKE THIS

SELECT THEDATE,MINX,MAXX,TOTY,Z.X AS FIRSTX,Z1.X AS LASTX
FROM (
select thedate,min(x) as minx,max(x) as maxx,sum(y) as toty,min(I) AS MINI,MAX(I) AS MAXI
FROM (
select date_format(datecol,'%Y%b%d %H%i') as thedate, x, y,i
from yourtable
) AS X
GROUP BY THEDATE
) AS y
) AS xx
iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z
ON xx.THEDATE=Z.THEDATE
AND MINI=Z.I
iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z1
ON xx.THEDATE=Z1.THEDATE
AND MAXI=Z1.I
ORDER BY 1
0

Author Commented:
Thanks a lot :)

I'm gonna go ahead and see if I can make some sense of that, but I notice one thing right away. The left parenthesis on the line
) AS xx
doesn't seem to have a corresponding right parenthesis.
0

Author Commented:
The query validates, and runs.

Unfortunately I get:
ERROR 126 (HY000): Incorrect key file for table '/tmp/#sql_3fc_2.MYI'; try to repair it

I googled a bit and found:
"was the result of a temporary table getting so large that it filled TMPDIR, generating the above error."

I'm gonna look into it.
0

Author Commented:
Hmm, yep, "watch df -h" during the query verifies I run out of disc space. (I have 3G free on that computer)
0

Author Commented:
Ok, so I worked that the disc space issue out.
The query runs for a minute, and then quits with

ERROR 1052 (23000) at line 1: Column 'THEDATE' in field list is ambiguous

I'm wildly guessing here that the very first THEDATE should be y.THEDATE?

Anyway, is there anything in particular that should be indexed for this query to run faster? I've been waiting for >2 hours now
0

Commented:
well a covering  index on the three columns

I,Datecol,x

in that order may help...

and an index on the DateCol itself may also help
0

Author Commented:
Neither did anything to help unfortunately.
I'm just gonna very that the output is what I expect it to be, then I'll close this question.

New question specific to performance optimization here:
http://www.experts-exchange.com/Database/MySQL/Q_25784510.html?fromWizard=true
0

Author Commented:
Works well :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.