group by all the columns in where clause?

Hi,

Is this a valid syntax? using group by.. do we need to group by on all the columns we use after "WHERE" ?

If this is not a valid, please help me remake.

>>Where operatorid, dtstart and etc<<

table called inout <--------------------------time records of login/out of employee
columns:
operatorid
timein
timeout

table called #tempshift <---------------------------daily shift of employee
columns
scheduleid
operatorid
dtstart
DTfirstbreakout


SELECT scheduleid, sum(round(datediff(n,r.timein,r.timeout)/60.00,2)) AS exceptiontimeout
INTO #tempexception  
 FROM inout AS r, #tempshift AS s  
 WHERE r.operatorid in (select distinct operatorid from  #tempshift)
And r.timein Between dateadd(n,-10,s.dtstart) And dateadd(n,30,s.dtend)
And r.timeout Between dateadd(n,-30,s.dtstart) And dateadd(n,15,s.dtend)
And s.dtstart Between @shiftdate and (@shiftdate + cast('23:59:59' as datetime)) and excuse <> ' '
 GROUP BY scheduleid
LVL 1
lynntonAsked:
Who is Participating?
 
solution46Commented:
With that part of the script, you'd be better off trying a join...

SELECT  
    scheduleid,
    Min(r.timein) AS firstbreakin
INTO #tempfirstbin  
FROM inout r,
    INNER JOIN #tempshift s
        ON r.operatorid = s.operatorid
WHERE r.timein Between
dateadd(n, @minbreakinstart, s.DTfirstbreakout)
And dateadd(n, @minbreakinstop), DTfirstbreakout)
And s.DTstart Between @shiftdate and (@shiftdate + cast('23:59:59' as datetime))  

GROUP BY scheduleid

s46
0
 
solution46Commented:
lynnton,

just from preference and mainly to make things easier to read, I would suggest breaking it apart a bit. Assuming you are doing this in a sproc...

DECLARE @minbreakinstart int, @minbreakinstop int

SELECT @minbreakstart = parametervalue from systemparameter where parametername='minbreakinstart'
SELECT @minbreakstop = parametervalue from systemparameter where parametername='minbreakinstop'

SELECT  
    scheduleid,
    Min(r.timein) AS firstbreakin
INTO #tempfirstbin  
FROM inout AS r, #tempshift AS s  
WHERE r.operatorid=s.operatorid
And r.timein Between
dateadd(n, @minbreakinstart, s.DTfirstbreakout)
And dateadd(n, @minbreakinstop), DTfirstbreakout)
And s.DTstart Between @shiftdate and (@shiftdate + cast('23:59:59' as datetime))  

GROUP BY scheduleid


This should work; can you try it and post the result please...

s46.
0
 
lynntonAuthor Commented:
solution46,

I've change it abit before you posted,

WHERE r.operatorid in (select distinct operatorid from  #tempshift)

will this speed up the process?

Thanks.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
lynntonAuthor Commented:
solution46,

Is this a valid group by? i really just wanted to group by  s.scheduleid. unfortunately inorder to query the other columns sql server tells me to add them to group by...is this the proper way?



Thanks.

GROUP BY s.scheduleid,s.operatorid , s.teamleader , s.operatorname , s.designation,s.dtstart,s.dtend  



SELECT s.operatorid AS operatorid,
s.teamleader as teamleader,
s.operatorname as name,
s.designation as designation,
s.scheduleid, min(r.timein) AS DTstart,
max(r.timeout) AS DTstop,
round(datediff(n,s.dtstart,min(r.timein))/60.00,2) as TotalLate ,
round(datediff(n,max(r.timeout),s.dtend)/60.00,2) as TotalLeaveEarly,
s.dtstart as ShiftStart,s.dtend as ShiftEnd  

INTO #tempinout  
FROM inout AS r
INNER JOIN #tempshift s ON r.operatorid = s.operatorid
WHERE r.timein Between dateadd(n,-9,s.dtstart) And s.dtend
And r.timeout Between dateadd(n,-9,s.dtstart) And dateadd(n,15,s.dtend)
And s.dtstart Between @shiftdate and @shiftdatestop  
GROUP BY s.scheduleid,s.operatorid , s.teamleader , s.operatorname , s.designation,s.dtstart,s.dtend  
0
 
solution46Commented:
lynnton,

yes, this is correct - I didn't realise about the other columns in the SELECT clause.

The rules of GROUPing are fairly straightforward. Any column that is referenced in the SELECT or ORDER BY clauses must either be aggregated or GROUPed, like this...

SELECT
    Field1,
    Max(Field2)
From MyTable
WHERE Field3 = Value
GROUP BY Field1, Field4
ORDER BY Field4

You can use another clause, HAVING, to filter on the aggregate expressions (it can filter on any expression but it is more efficient to use WHERE to filter non aggregat exressions). e.g. you could add the following line...

HAVING Max(Field2) > 10

What this does is...

SELECT
    Field1,
this bit selects all the values of Field1
   
    Max(Field2)
this selects the maximum value of Field2 for each group (to be defined). At this point it is obvious that we will need to GROUP BY something because otherwise this Max() function has no meaning.

From MyTable
specifies the source data. You can include as many joins as you like, referencing as many fields as you like. None of these need to be included in the GROUP BY clause because this is building up the data that GROUP BY will aggregate.

WHERE Field3 = Value
applies a filter to the source data, BEFORE it is aggregated. Consequently, fields ONLY appearing in the WHERE clause do not need to be GROUPed.

GROUP BY Field1, Field4
at this point, we need to see what fields we want to GROUP BY. First off, look at the SELECT and ORDER BY clauses. Any fields listed there that have not been aggregated need to be included in the GROUP BY clause. The GROUP BY clause will take the base data defined in the SELECT ... FROM ... WHERE part and group by the listed fields.

ORDER BY Field4
The ORDER BY is a modifier applied to the aggregated data in order to sort it. Consequently, and field added into the ORDER BY clause is implicitly SELECTed; SQL Server sees that field as though it is in the SELECT clause.


Hope this helps,

s46.
0
 
solution46Commented:
er, forgot the HAVING bit...

this is a modifer appliad AFTER the aggregation. In this case, it looks at the maximum values of Field2 and filters out any rows having a vlaue <= 10. This filter could easily be applied at the WHERE clause level; HAVING really becomes useful for doing things like this...

HAVING Max(FieldA) + Max(FieldB) > 10. The WHERE clause could not filter this out as the two maxima may well be on separate rows.


s46.
0
 
lynntonAuthor Commented:
solution46,

Would the below query affect results?

Cause sql server is telling me to add those other columns to group by, if it will affect, maybe we can do something? can we jion table again then add those columns that we need?
 
Thanks.

SELECT s.operatorid AS operatorid,  <-----------------------------ugly group by, query1
s.teamleader as teamleader,
s.operatorname as name,
s.designation as designation,
s.scheduleid, min(r.timein) AS DTstart,
max(r.timeout) AS DTstop,
round(datediff(n,s.dtstart,min(r.timein))/60.00,2) as TotalLate ,
round(datediff(n,max(r.timeout),s.dtend)/60.00,2) as TotalLeaveEarly,
s.dtstart as ShiftStart,s.dtend as ShiftEnd  

INTO #tempinout  
FROM inout AS r
INNER JOIN #tempshift s ON r.operatorid = s.operatorid
WHERE r.timein Between dateadd(n,-9,s.dtstart) And s.dtend
And r.timeout Between dateadd(n,-9,s.dtstart) And dateadd(n,15,s.dtend)
And s.dtstart Between @shiftdate and @shiftdatestop  
GROUP BY s.scheduleid,s.operatorid , s.teamleader , s.operatorname , s.designation,s.dtstart,s.dtend  


VS


SELECT  s.scheduleid, <-----------------------------this is the only group by we wanted, query 2
min(r.timein) AS DTstart,
max(r.timeout) AS DTstop,
round(datediff(n,s.dtstart,min(r.timein))/60.00,2) as TotalLate ,
round(datediff(n,max(r.timeout),s.dtend)/60.00,2) as TotalLeaveEarly,
s.dtstart as ShiftStart,s.dtend as ShiftEnd  

INTO #tempinout  
FROM inout AS r
INNER JOIN #tempshift s ON r.operatorid = s.operatorid
WHERE r.timein Between dateadd(n,-9,s.dtstart) And s.dtend
And r.timeout Between dateadd(n,-9,s.dtstart) And dateadd(n,15,s.dtend)
And s.dtstart Between @shiftdate and @shiftdatestop  
GROUP BY s.scheduleid  
0
 
solution46Commented:
lynnton,

the first query is pefectly acceptable. All the aggregation is doing is showing every unique combination of s.scheduleid,s.operatorid , s.teamleader , s.operatorname , s.designation,s.dtstart,s.dtend and showing the maximum value of r.timeout for these rows. This is a pretty normal way of doing things.

the second query is probably going to fail as you'll still need to GROUP BY dtstart and dtend as well as s.scheduleid.

Sometimes long-winded GROUP BY clauses are unavoidable, particularly if you are including lots of fields from the same table.

s46
0
 
lynntonAuthor Commented:
solution46,

Thanks so muc hfor your time and patience
0
 
solution46Commented:
no probs, good luck with the DB.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.