Howard Bash
asked on
t-sql aggregate problem
I have a table with a date and a unique user id. I would like to generate a resultset that gives me the count of user table inserts per hour. Here is desired output:
DayOfWeek | Date | HourInterval | UserCount
Monday | 03/05/2009 | 1 | 12
Monday | 03/05/2009 | 2 | 32
Monday | 03/05/2009 | 3 | 19
Monday | 03/05/2009 | 4 | 72
Monday | 03/05/2009 | 5 | 4
Monday | 03/05/2009 | 6 | 11
...
DayOfWeek | Date | HourInterval | UserCount
Monday | 03/05/2009 | 1 | 12
Monday | 03/05/2009 | 2 | 32
Monday | 03/05/2009 | 3 | 19
Monday | 03/05/2009 | 4 | 72
Monday | 03/05/2009 | 5 | 4
Monday | 03/05/2009 | 6 | 11
...
select
datename(dw,T.MY_DATETIME) as DayOfTheWeek,
T.MY_DATETIME,
datepart(hh,T.ACTLOG_DATETIME) as HourInterval
count(T.USER_IDENTIFIER) as UserCount
from log_hist T
Where
T.MY_DATETIME>convert(datetime,'10/22/2009') AND T.MY_DATETIME<convert(datetime, '11/30/2009')
group by
T.MY_DATETIME,
datename(dw,T.MY_DATETIME),
datepart(hh,T.ACTLOG_DATETIME)
ASKER
Both queries seem "close" but the myDate value has lost the valid time and is showing 12:00.
because you need to get rid of the time portion so that the grouping can be done by date only
if you don't want the 12:00 to appear do
...convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) as myDate, ....
.....
group by convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) ....
if you don't want the 12:00 to appear do
...convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) as myDate, ....
.....
group by convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) ....
ASKER
Honestly i would like to show the date and time as is in the database field.
Maybe this is what you're looking for then
select
datename(dw,T.MY_DATETIME) as DayOfTheWeek,
T.MY_DATETIME,
datepart(hh,T.ACTLOG_DATETIME) as HourInterval,
count(T.USER_IDENTIFIER) over (partition by cast(T.MY_DATETIME as date)) as UserCount
from log_hist T
Where
T.MY_DATETIME>convert(datetime,'10/22/2009') AND T.MY_DATETIME<convert(datetime, '11/30/2009')
sorry like this
select
datename(dw,T.MY_DATETIME) as DayOfTheWeek,
T.MY_DATETIME,
datepart(hh,T.ACTLOG_DATETIME) as HourInterval,
count(T.USER_IDENTIFIER) over (partition by dateadd(hh, datediff(hh, 0, T.MY_DATETIME as date), 0)) as UserCount
from log_hist T
Where
T.MY_DATETIME>convert(datetime,'10/22/2009') AND T.MY_DATETIME<convert(datetime, '11/30/2009')
and btw, you don't need the convert there in your where clause
select
datename(dw,T.MY_DATETIME) as DayOfTheWeek,
T.MY_DATETIME,
datepart(hh,T.ACTLOG_DATETIME) as HourInterval,
count(T.USER_IDENTIFIER) over (partition by dateadd(hh, datediff(hh, 0, T.MY_DATETIME as date), 0)) as UserCount
from log_hist T
Where
T.MY_DATETIME> '10/22/2009' AND T.MY_DATETIME < '11/30/2009'
ASKER
Get's error:
Incorrect syntax near the keyword 'as'.
pointing to count(...) line.
also, the user_identifier counts need to be unique within the "hour" interval being reported on.
can you possible modify the three flavors of queries with this uniqueness and let me know about the error listed here?
Thanks much. It really is close (I have a few questions though about the dateadd stuff but would rather look at the final solution first).
Again, THANKS!
Incorrect syntax near the keyword 'as'.
pointing to count(...) line.
also, the user_identifier counts need to be unique within the "hour" interval being reported on.
can you possible modify the three flavors of queries with this uniqueness and let me know about the error listed here?
Thanks much. It really is close (I have a few questions though about the dateadd stuff but would rather look at the final solution first).
Again, THANKS!
oops, yeah line 5 above should be
count(T.USER_IDENTIFIER) over (partition by dateadd(hh, datediff(hh, 0, T.MY_DATETIME), 0)) as UserCount
try it and let me know
I'm just using dateadd ... and datediff to group all the dates to the hour.
count(T.USER_IDENTIFIER) over (partition by dateadd(hh, datediff(hh, 0, T.MY_DATETIME), 0)) as UserCount
try it and let me know
I'm just using dateadd ... and datediff to group all the dates to the hour.
ASKER
It now runs, but the resultset is off. There is no grouping though so ...
>>There is no grouping though so ...<<
Yes, because you cannot have both. If you want the grouping you need to decide which date to show, which in my opinion it makes sense my comment http:#a35779704 with the adjustment in http:#a35780266. Remember that the hour is shown with the HoursInterval column. So, I'm not sure why you're saying that this is not what you expect.
So basically if your data looks like this:
DayOfWeek | Date |
Monday | 03/05/2009 1:01
Monday | 03/05/2009 1:21
Monday | 03/05/2009 1:45
Monday | 03/05/2009 2:03
Monday | 03/05/2009 3:15
Monday | 03/05/2009 4:02
and you want
DayOfWeek | Date | HourInterval | UserCount
Monday | 03/05/2009 | 1 | 3
Monday | 03/05/2009 | 2 | 1
Monday | 03/05/2009 | 3 | 1
Monday | 03/05/2009 | 4 | 1
you definitively need to drop the time portion in the "date" column. otherwise there's nothing to group and you will end up with the same output as your input.
Yes, because you cannot have both. If you want the grouping you need to decide which date to show, which in my opinion it makes sense my comment http:#a35779704 with the adjustment in http:#a35780266. Remember that the hour is shown with the HoursInterval column. So, I'm not sure why you're saying that this is not what you expect.
So basically if your data looks like this:
DayOfWeek | Date |
Monday | 03/05/2009 1:01
Monday | 03/05/2009 1:21
Monday | 03/05/2009 1:45
Monday | 03/05/2009 2:03
Monday | 03/05/2009 3:15
Monday | 03/05/2009 4:02
and you want
DayOfWeek | Date | HourInterval | UserCount
Monday | 03/05/2009 | 1 | 3
Monday | 03/05/2009 | 2 | 1
Monday | 03/05/2009 | 3 | 1
Monday | 03/05/2009 | 4 | 1
you definitively need to drop the time portion in the "date" column. otherwise there's nothing to group and you will end up with the same output as your input.
Please note that when I say drop the time portion, I mean it for display purpose only, not from your table. The original data remains the same, it's just the select query adjusting it so you get the correct count. Here's the query again. Try it and let me know.
select
datename(dw,T.MY_DATETIME) as DayOfTheWeek,
convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103) as myDate,
datepart(hh,T.ACTLOG_DATETIME) as HourInterval
count(T.USER_IDENTIFIER) as UserCount
from log_hist T
Where
T.MY_DATETIME>convert(datetime,'10/22/2009') AND T.MY_DATETIME<convert(datetime, '11/30/2009')
group by
convert(varchar, dateadd(d, dateadiff(d, 0, T.MY_DATETIME), 0), 103),
datename(dw,T.MY_DATETIME),
datepart(hh,T.ACTLOG_DATETIME)
one more thing, I guess in your example t.My_Datetime and t.ACTLOG_DATETIME are the same column, right?
ASKER
yes
ok, try my last comment http:#a35781088 and let me know.
ASKER
It runs now. I am not sure if the count of users is unique per hour though.
The MyDate field : convert(varchar, dateadd(d, datediff(d, 0, T.ACTLOG_DATETIME), 0), 103) as myDate, gives me the date in the format dd/mm/yyyy. Can you tell me the value to get the format mm/dd/yyyy?
The MyDate field : convert(varchar, dateadd(d, datediff(d, 0, T.ACTLOG_DATETIME), 0), 103) as myDate, gives me the date in the format dd/mm/yyyy. Can you tell me the value to get the format mm/dd/yyyy?
ASKER
Ok. sorry, the format I needed for the dates is "101". But, I do think that the user id values per hour are more than once. The last piece of this is to only have one occurrence per hour per user in the usercount hour buckets.
ASKER
The users must repeat in this query as there are some rows with counts higher than the total number of users.
I'm not following you, Could you post some sample data (input) and the expected result?
ASKER
The table is an activity table for a web application. Every time a user goes to a page the access is logged with user id, datetime of access (multiple inserts per user per hour). I am trying to generate a resultset that represents how many users accessed the web site per hour. So, since all the user's accesses are written to this activity table, the user id will appear several times in a one hour window in this logging table.
The resultset should answer the question: "How many users access this web site per hour hour (not how many times per user which is what we have)" for each hour of the day for a date range.
It's kind of like we want the hourly bucket to contain the results of a select "distinct" user_identifier count.
The resultset should answer the question: "How many users access this web site per hour hour (not how many times per user which is what we have)" for each hour of the day for a date range.
It's kind of like we want the hourly bucket to contain the results of a select "distinct" user_identifier count.
ASKER
So, if user1 access the web site 5 times at 9:00pm today and user34 access the web site 12 times within the same 9:00PM "bucket" the output would be (assuming just these two accessed the site):
DayOfWeek | Date | HourInterval | UserCount
Tuesday | 05/17/2011 | 21 | 2
DayOfWeek | Date | HourInterval | UserCount
Tuesday | 05/17/2011 | 21 | 2
ASKER
DayOfWeek | Date | HourInterval | UserCount
Tuesday | 05/17/2011 | 21 | 2
and then another 3 users access the site at 10:15PM
DayOfWeek | Date | HourInterval | UserCount
Tuesday | 05/17/2011 | 21 | 2
Tuesday | 05/17/2011 | 22 | 3
Tuesday | 05/17/2011 | 21 | 2
and then another 3 users access the site at 10:15PM
DayOfWeek | Date | HourInterval | UserCount
Tuesday | 05/17/2011 | 21 | 2
Tuesday | 05/17/2011 | 22 | 3
ASKER
What you showed for input and output could be correct in post: ID: 35781040
So, if each access was by a different users for entries within the "same hour", the output is what I am trying to obtain.
Alternately, if every access was by one user your data would look like this:
DayOfWeek | Date |
Monday | 03/05/2009 1:01
Monday | 03/05/2009 2:03
Monday | 03/05/2009 3:15
Monday | 03/05/2009 4:02
(note that the 1:00 basket only has 1 user and so that is it's only row above)
yielding:
DayOfWeek | Date | HourInterval | UserCount
Monday | 03/05/2009 | 1 | 1
Monday | 03/05/2009 | 2 | 1
Monday | 03/05/2009 | 3 | 1
Monday | 03/05/2009 | 4 | 1
So, if each access was by a different users for entries within the "same hour", the output is what I am trying to obtain.
Alternately, if every access was by one user your data would look like this:
DayOfWeek | Date |
Monday | 03/05/2009 1:01
Monday | 03/05/2009 2:03
Monday | 03/05/2009 3:15
Monday | 03/05/2009 4:02
(note that the 1:00 basket only has 1 user and so that is it's only row above)
yielding:
DayOfWeek | Date | HourInterval | UserCount
Monday | 03/05/2009 | 1 | 1
Monday | 03/05/2009 | 2 | 1
Monday | 03/05/2009 | 3 | 1
Monday | 03/05/2009 | 4 | 1
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Just right. I bow to the master. If you're ever in NYC...
A minor note, you have a typo: the function datediff you typed as dateadiff (an extra a).
A minor note, you have a typo: the function datediff you typed as dateadiff (an extra a).
ASKER
An extremely knowledgeable and patient SQL master.
Glad to help!
Open in new window
If in SQL 2005
Open in new window