Link to home
Create AccountLog in
Avatar of Howard Bash
Howard BashFlag for United States of America

asked on

t-sql aggregate problem

I have a table with a date and a unique user id.  I would like to generate a resultset that gives me the count of user table inserts per hour.  Here is desired output:
DayOfWeek  |  Date  |  HourInterval  |  UserCount
Monday | 03/05/2009 |  1  |  12
Monday | 03/05/2009 |  2  |  32
Monday | 03/05/2009 |  3  |  19
Monday | 03/05/2009 |  4  |  72
Monday | 03/05/2009 |  5  |  4
Monday | 03/05/2009 |  6  |  11
...



select 
  datename(dw,T.MY_DATETIME) as DayOfTheWeek,
  T.MY_DATETIME,
  datepart(hh,T.ACTLOG_DATETIME) as HourInterval
  count(T.USER_IDENTIFIER) as UserCount

from log_hist T

Where
  T.MY_DATETIME>convert(datetime,'10/22/2009') AND   T.MY_DATETIME<convert(datetime, '11/30/2009')
group by
  T.MY_DATETIME,
  datename(dw,T.MY_DATETIME), 
  datepart(hh,T.ACTLOG_DATETIME)

Open in new window

Avatar of ralmada
ralmada
Flag of Canada image

If you're in SQL 2008 try

 
select 
  datename(dw,T.MY_DATETIME) as DayOfTheWeek,
  cast(T.MY_DATETIME as date) as myDate,
  datepart(hh,T.ACTLOG_DATETIME) as HourInterval
  count(T.USER_IDENTIFIER) as UserCount

from log_hist T

Where
  T.MY_DATETIME>convert(datetime,'10/22/2009') AND   T.MY_DATETIME<convert(datetime, '11/30/2009')
group by
  cast(T.MY_DATETIME as date),
  datename(dw,T.MY_DATETIME), 
  datepart(hh,T.ACTLOG_DATETIME)

Open in new window


If in SQL 2005

 
select 
  datename(dw,T.MY_DATETIME) as DayOfTheWeek,
  dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0) as myDate,
  datepart(hh,T.ACTLOG_DATETIME) as HourInterval
  count(T.USER_IDENTIFIER) as UserCount

from log_hist T

Where
  T.MY_DATETIME>convert(datetime,'10/22/2009') AND   T.MY_DATETIME<convert(datetime, '11/30/2009')
group by
  dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0),
  datename(dw,T.MY_DATETIME), 
  datepart(hh,T.ACTLOG_DATETIME)

Open in new window

Avatar of Howard Bash

ASKER

Both queries seem "close" but the myDate value has lost the valid time and is showing 12:00.
because you need to get rid of the time portion so that the grouping can be done by date only

if you don't want the 12:00 to appear do

...convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) as myDate, ....

.....
group by convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) ....
Honestly i would like to show the date and time as is in the database field.
Maybe this is what you're looking for then

select 
  datename(dw,T.MY_DATETIME) as DayOfTheWeek,
  T.MY_DATETIME,
  datepart(hh,T.ACTLOG_DATETIME) as HourInterval,
  count(T.USER_IDENTIFIER) over (partition by   cast(T.MY_DATETIME as date)) as UserCount

from log_hist T

Where
  T.MY_DATETIME>convert(datetime,'10/22/2009') AND   T.MY_DATETIME<convert(datetime, '11/30/2009')

Open in new window

sorry like this
select 
  datename(dw,T.MY_DATETIME) as DayOfTheWeek,
  T.MY_DATETIME,
  datepart(hh,T.ACTLOG_DATETIME) as HourInterval,
  count(T.USER_IDENTIFIER) over (partition by   dateadd(hh, datediff(hh, 0, T.MY_DATETIME as date), 0)) as UserCount

from log_hist T

Where
  T.MY_DATETIME>convert(datetime,'10/22/2009') AND   T.MY_DATETIME<convert(datetime, '11/30/2009')

Open in new window

and btw, you don't need the convert there in your where clause
select 
  datename(dw,T.MY_DATETIME) as DayOfTheWeek,
  T.MY_DATETIME,
  datepart(hh,T.ACTLOG_DATETIME) as HourInterval,
  count(T.USER_IDENTIFIER) over (partition by   dateadd(hh, datediff(hh, 0, T.MY_DATETIME as date), 0)) as UserCount

from log_hist T

Where
  T.MY_DATETIME> '10/22/2009' AND T.MY_DATETIME < '11/30/2009'

Open in new window

Get's error:
Incorrect syntax near the keyword 'as'.
pointing to count(...) line.

also,  the user_identifier counts need to be unique within the "hour" interval being reported on.
can you possible modify the three flavors of queries with this uniqueness and let me know about the error listed here?

Thanks much.  It really is close (I have a few questions though about the dateadd stuff but would rather look at the final solution first).

Again, THANKS!
oops, yeah line 5 above should be

count(T.USER_IDENTIFIER) over (partition by   dateadd(hh, datediff(hh, 0, T.MY_DATETIME), 0)) as UserCount

try it and let me know

I'm just using dateadd ... and datediff to group all the dates to the hour.
It now runs, but the resultset is off.  There is no grouping though so ...
>>There is no grouping though so ...<<
Yes, because you cannot have both. If you want the grouping you need to decide which date to show, which in my opinion it makes sense my comment http:#a35779704  with the adjustment in http:#a35780266. Remember that the hour is shown with the HoursInterval column. So, I'm not sure why you're saying that this is not what you expect.

So basically if your data looks like this:

DayOfWeek  |  Date  |  
Monday | 03/05/2009  1:01
Monday | 03/05/2009  1:21
Monday | 03/05/2009  1:45
Monday | 03/05/2009  2:03
Monday | 03/05/2009  3:15
Monday | 03/05/2009  4:02


and you want

DayOfWeek  |  Date  |  HourInterval  |  UserCount
Monday | 03/05/2009 |  1  |  3
Monday | 03/05/2009 |  2  |  1
Monday | 03/05/2009 |  3  |  1
Monday | 03/05/2009 |  4  |  1

you definitively need to drop the time portion in the "date" column. otherwise there's nothing to group and you will end up with the same output as your input.
Please note that when I say drop the time portion, I mean it for display purpose only, not from your table. The original data remains the same, it's just the select query adjusting it so you get the correct count. Here's the query again. Try it and let me know.

select 
  datename(dw,T.MY_DATETIME) as DayOfTheWeek,
  convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) as myDate,
  datepart(hh,T.ACTLOG_DATETIME) as HourInterval
  count(T.USER_IDENTIFIER) as UserCount

from log_hist T

Where
  T.MY_DATETIME>convert(datetime,'10/22/2009') AND   T.MY_DATETIME<convert(datetime, '11/30/2009')
group by
  convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103),
  datename(dw,T.MY_DATETIME), 
  datepart(hh,T.ACTLOG_DATETIME)

Open in new window

one more thing, I guess in your example t.My_Datetime and t.ACTLOG_DATETIME are the same column, right?
yes
ok, try my last comment http:#a35781088 and let me know.
It runs now.  I am not sure if the count of users is unique per hour though.

The MyDate field :   convert(varchar, dateadd(d, datediff(d, 0, T.ACTLOG_DATETIME), 0), 103) as myDate, gives me the date in the format dd/mm/yyyy.  Can you tell me the value to get the format mm/dd/yyyy?
Ok. sorry, the format I needed for the dates is "101".    But,  I do think that the user id values per hour are more than once.  The last piece of this is to only have one occurrence per hour per user in the usercount hour buckets.
The users must repeat in this query as there are some rows with counts higher than the total number of users.
I'm not following you, Could you post some sample data (input) and the expected result?
The table is an activity table for a web application.  Every time a user goes to a page the access is logged with user id,  datetime of access (multiple inserts per user per hour).  I am trying to generate a resultset that represents how many users accessed the web site per hour.  So,  since all the user's accesses are written to this activity table,  the user id will appear several times in a one hour window in this logging table.

The resultset should answer the question: "How many users access this web site per hour hour (not how many times per user which is what we have)"  for each hour of the day for a date range.


It's kind of like we want the hourly bucket to contain the results of a select "distinct" user_identifier count.
So,  if user1 access the web site 5 times at 9:00pm today and user34 access the web site 12 times within the same 9:00PM "bucket" the output would be (assuming just these two accessed the site):

DayOfWeek  |  Date  |  HourInterval  |  UserCount
Tuesday | 05/17/2011 |  21  |  2


DayOfWeek  |  Date  |  HourInterval  |  UserCount
Tuesday | 05/17/2011 |  21  |  2

and then another 3 users access the site at 10:15PM

DayOfWeek  |  Date  |  HourInterval  |  UserCount
Tuesday | 05/17/2011 |  21  |  2
Tuesday | 05/17/2011 |  22  |  3



What you showed for input and output could be correct in post: ID: 35781040
So, if each access was by a different users for entries within the "same hour", the output is what I am trying to obtain.

Alternately, if every access was by one user your data would look like this:

DayOfWeek  |  Date  |  
Monday | 03/05/2009  1:01
Monday | 03/05/2009  2:03
Monday | 03/05/2009  3:15
Monday | 03/05/2009  4:02

(note that the 1:00 basket only has 1 user and so that is it's only row above)

yielding:

DayOfWeek  |  Date  |  HourInterval  |  UserCount
Monday | 03/05/2009 |  1  |  1
Monday | 03/05/2009 |  2  |  1
Monday | 03/05/2009 |  3  |  1
Monday | 03/05/2009 |  4  |  1

ASKER CERTIFIED SOLUTION
Avatar of ralmada
ralmada
Flag of Canada image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Just right.  I bow to the master.  If you're ever in NYC...
A minor note,  you have a typo:  the function datediff you typed as dateadiff (an extra a).
An extremely knowledgeable and patient SQL master.
Glad to help!