How to select max/min/first/last by group. Aka "Can my program become a short query"?

I have a program that "selects * from table" and the processes this data, like this:

1) Group each entry by their datetime column, ignoring second precision, order the group by an index value (column i)
2) Find the largest, smallest, first, and last value of column x, in each group.
3) Sum all the values of column y in each group
4) Loop over all data, the write to disc.

I have already written the program, so the task is done, but I think the whole think looks like it could be done easily directly in sql. If this can be done in a "short enough" query, then I'd have fewer bugs to worry about in my own code.

Mainly as an exercise to myself in sql, I'm wondering just how easily this can be done in sql.

Selecting the max(x) and min(x) values of each group is probably quite simple, and I'm hoping the first/last problem could be solved by using max(i)/min(i), but I'm not very familiar with sql, and just randomly guessing seems like a big waste of time.

Again, my question is really "how short" this can be, since if the query requires a lot lines, I'm actually worse of, since I would then need to debug in an unfamiliar language.
LVL 6
letharionAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Goodangel MatopeSoftware ArchitectCommented:
Well, if you are unfamiliar with SQL then it might be a good idea to leave it as is. However, if you are familiar with MySQL stored procedures, that would be the way to go. You could create a stored procedure that obtains the result set and saves it to a disk file.
0
letharionAuthor Commented:
>if you are unfamiliar with SQL then it might be a good idea to leave it as is
Good point. But regardless of whether I bring such a query into a production environment or not, it would still have been interesting (and hopefully effective) to learn SQL by having "mapped" a query to my code.
0
LowfatspreadCommented:
the main reason for doing this in the sql is to reduce the network traffic/overhead of your calling application
having to retrieve each row from the database, which could be considerable

SOMETHING LIKE THIS


SELECT THEDATE,MINX,MAXX,TOTY,Z.X AS FIRSTX,Z1.X AS LASTX
FROM (
select thedate,min(x) as minx,max(x) as maxx,sum(y) as toty,min(I) AS MINI,MAX(I) AS MAXI
FROM (
select date_format(datecol,'%Y%b%d %H%i') as thedate, x, y,i
from yourtable
) AS X
GROUP BY THEDATE
) AS y
) AS xx
iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z
ON xx.THEDATE=Z.THEDATE
AND MINI=Z.I
iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z1
ON xx.THEDATE=Z1.THEDATE
AND MAXI=Z1.I
ORDER BY 1
0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

letharionAuthor Commented:
Thanks a lot :)

I'm gonna go ahead and see if I can make some sense of that, but I notice one thing right away. The left parenthesis on the line
) AS xx
doesn't seem to have a corresponding right parenthesis.
0
LowfatspreadCommented:
ok try this

SELECT THEDATE,MINX,MAXX,TOTY,Z.X AS FIRSTX,Z1.X AS LASTX
FROM (

select thedate,min(x) as minx,max(x) as maxx,sum(y) as toty,min(I) AS MINI,MAX(I) AS MAXI
FROM (
select date_format(datecol,'%Y%b%d %H%i') as thedate, x, y,i
from yourtable
) AS X
GROUP BY THEDATE
) AS y



iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z

ON y.THEDATE=Z.THEDATE
AND MINI=Z.I

iNNER jOIN
(
select date_format(datecol,'%Y%b%d %H%i') as thedate, x,i
from yourtable
) AS Z1
ON y.THEDATE=Z1.THEDATE
AND MAXI=Z1.I

ORDER BY 1

;-)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
letharionAuthor Commented:
The query validates, and runs.

Unfortunately I get:
ERROR 126 (HY000): Incorrect key file for table '/tmp/#sql_3fc_2.MYI'; try to repair it

I googled a bit and found:
"was the result of a temporary table getting so large that it filled TMPDIR, generating the above error."

I'm gonna look into it.
0
letharionAuthor Commented:
Hmm, yep, "watch df -h" during the query verifies I run out of disc space. (I have 3G free on that computer)
0
letharionAuthor Commented:
Ok, so I worked that the disc space issue out.
The query runs for a minute, and then quits with

ERROR 1052 (23000) at line 1: Column 'THEDATE' in field list is ambiguous

I'm wildly guessing here that the very first THEDATE should be y.THEDATE?

Anyway, is there anything in particular that should be indexed for this query to run faster? I've been waiting for >2 hours now
0
LowfatspreadCommented:
well a covering  index on the three columns

 I,Datecol,x

in that order may help...

and an index on the DateCol itself may also help
0
letharionAuthor Commented:
Neither did anything to help unfortunately.
I'm just gonna very that the output is what I expect it to be, then I'll close this question.

New question specific to performance optimization here:
http://www.experts-exchange.com/Database/MySQL/Q_25784510.html?fromWizard=true
0
letharionAuthor Commented:
Works well :)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.