Link to home
Start Free TrialLog in
Avatar of Victor Nares
Victor NaresFlag for United States of America

asked on

When to use an Aggregate Function.

One of many of the great solutions I received here had to do with the use of a group by. I've been reviewing the solution every sense and wanted to ask a follow up question. Please see original question below:

Thank you all in advance. I suspect this will be an easy one.

we have:

invNO  rowID  DFlagcount
100       1                1
100       2             null
100       3                1
101       4            Null
101       5            Null
101       6                 1
101       7            null

We have 2 invoices.  We want to pull back in our result set only the row id that corresponds to the invoice that has a Dflagcount =1.

In this case, invoice 101 RowID 6.


The solution provided  was elegant in its simplicity but did require some reflection to understand. See below:

SELECT invNO, MAX(CASE WHEN Dflagcount = 1 THEN rowID END) AS rowID
FROM table_name
GROUP BY invNO
HAVING SUM(CASE WHEN Dflagcount = 1 THEN 1 ELSE 0 END) = 1



My follow up question is:
the logic in the HAVING SUM clause insures only the rows that equal 1 will be returned. Why do we need a MAX in the SELECT? I've run it using a MIN and still get the correct result.

Best I can tell, using the aggregate function allows us to group the data properly and the actual aggregate function used is secondary since  the HAVING will return only the rows in the set that equal 1.

Im probably over thinking this but my intention is to understand the "when to use" part of aggregate functions.

Thanks in advance. Again!
Avatar of YZlat
YZlat
Flag of United States of America image

if I understood correctly, you could have gone with something simpler:

SELECT t.invNo, t.rowID FROM table_name as t
INNER JOIN (SELECT MAX(rowID) FROM table_name WHERE DFlagCount=1) as a
ON t. rowID=a.rowID 
WHERE t.DFlagCount=1

Open in new window

SOLUTION
Avatar of PatHartman
PatHartman
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
With the single row response in mind, I agree with PatHartman response.  I only jumped on as your data shows multiple invoices which would not work in the TOP 1 solution but the ROW_NUMBER() technique accomplishes the same thing.  For a single row over all the data, TOP 1 would make sense and be easier to read later as it is self explanatory.
I only jumped on as your data shows multiple invoices which would not work in the TOP 1 solution
But it does work as long as the invoice numbers are assigned sequentially and you sort in descending order.  With the given example, 101 is the "newest" invoice and that will sort to the top.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That was my point Pat.  The TOP 1 approach only works if the intent is to get latest invoice.  If the intent is to get the latest row for each invoice, the ROW_NUMBER works the same way.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Scott, If you wrote the original query then there must be something different in the original question than there is here.  Victor said he wanted only one row -
In this case, invoice 101 RowID 6
 which is different from what you indicate the query actually returns.
Not quite following you.  Did you run my code against the sample data?

Victor said the only valid result from the sample data he presented was InvNo 101.  That's true, because InvNo 100 has 2 rows where the DFlagcount = 1.  

He only posted those 2 sample rows, but it's enough to get the point across if you take the entire query into account.  You can't just ignore the HAVING clause: it's critical to the way the code functions.
Agreed, Scott.  I read this question as Victor asking for explanation of why MAX in the solution.
I do not read anything in his question that states it is not working.  As I said to others, I saw their solution and sample data and initially made a false assumption myself that the new question was about expanding the solution to other invoices but when you read through more carefully it is clear the only question is about the aggregate function as the title of the question suggests.
You are correct Scott.  The problem statement was incomplete and I missed the multiple instances in invoice 100 so it looked like he wanted the "newest" invoice.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>Why do we need a MAX in the SELECT? I've run it using a MIN and still get the correct result.<<
What I gather from your original question is that it appears returning invNO 101 and rowID 6  was the result you were looking for. However, if you removed the HAVING SUM(CASE WHEN Dflagcount = 1 THEN 1 ELSE 0 END) = 1 clause, it would have also returned invNO 100 and rowID 3, which is apparently not what you wanted. Changing MAX to MIN would still produce invNO 101 and rowID 6 because there is only one DFlagcount with a 1 value, but again without the HAVING clause, invNO 100 and rowID 1 would have been returned. The MAX (or MIN) is needed because you need to have an aggregate in order to use the HAVING clause and you need the HAVING clause to insure that only invoices with one DFlagcount value equal to 1 is returned.
Avatar of Victor Nares

ASKER

Great feed back all the way around.
My goal here is to get better at writing queries.

Scott's approach, for me, demonstrated the order in which a query is read by a human is not necessarily the best way to understand how it works.

The GROUP BY comes three lines in. However, when I think about how to meet the requirement, grouping the data by invoice is my first consideration...after the FROM of course. My intent in asking this question was to begin to better understand and develop an order of operations for writing queries. What to think of first... In this case, group the data by invoice first then isolate the desired row by embedding some logic into the SELECT.

I'm going to take some time to digest some of the other solutions...Many ways to skin the cat and each has its merit.

Thank you all for contributing!
Thank you for all your contributions.
I never saw ScottPletcher's elegantly stated solution before I posted. I need to learn to type faster :-)