asked on

sql group by time range selecting min max first last

Hi Experts,

I'm revisiting a question i asked here: https://www.experts-exchange.com/questions/23351412/sql-group-by-time-range-selecting-min-max-first-last.html

In that question, i was looking for an answer in access, however i'd posted in the SQL forum. angelIII kindly put me straight and then promptly proceeded to give a rather impressive answer that did indeed work in Access. Unfortunately, i've been unable to translate the answer into MS SQL.

THIS TIME, I REALLY DO NEED IN TSQL for 2k5!

I need some clever sql that will produce the output shown in the code snippet from the input (jtable) also shown in the snippet.

Extra notes:
o = the first value during the time period
h = the max value during the time period
l = the min value during the time period
c = the last value during the time period

u = the grouped time period, in this case, always 15 minutes.
18:01 is the min(jtime), so the first hour preceding
the time period is 18:00,
Therefore 18:00 - 18:15 is the first time period.

jtable:
 
|-------|----|
| jtime | b  |
|-------|----|
| 18:01 | 20 |
| 18:02 | 23 |
| 18:06 | 22 |
| 18:08 | 21 |
| 18:16 | 20 |
| 18:17 | 20 |
| 18:18 | 22 |
| 18:19 | 24 |
| 18:21 | 25 |
| 18:25 | 26 |
| 18:28 | 27 |
| 18:29 | 28 |
| 18:46 | 30 |
| 18:47 | 30 |
| 18:49 | 28 |
| 18:51 | 26 |
| 18:53 | 28 |
| 18:54 | 30 |
| 18:58 | 32 |
| 18:59 | 30 |
|-------|----|
 
 
desired output:
 
|-------|----|----|----|----|
| u     | o  | h  | l  | c  |
|-------|----|----|----|----|
| 18:00 | 20 | 23 | 20 | 21 |
| 18:15 | 20 | 28 | 20 | 28 |
| 18:30 |    |    |    |    |
| 18:45 | 30 | 32 | 28 | 30 |
|-------|----|----|----|----|
 
here's the code to create the table.
 
/*******************************/
CREATE TABLE [dbo].[jtable](
	[jtime] [datetime] NULL,
	[b] [int] NOT NULL
) ON [PRIMARY]
/*******************************/
 
 
and this creates the rows:
 
/**********************************************************/
insert into jtable (jtime, b) values('01/10/2008 18:01', 20)
insert into jtable (jtime, b) values('01/10/2008 18:02', 23)
insert into jtable (jtime, b) values('01/10/2008 18:06', 22)
insert into jtable (jtime, b) values('01/10/2008 18:08', 21)
insert into jtable (jtime, b) values('01/10/2008 18:16', 20)
insert into jtable (jtime, b) values('01/10/2008 18:17', 20)
insert into jtable (jtime, b) values('01/10/2008 18:18', 22)
insert into jtable (jtime, b) values('01/10/2008 18:19', 24)
insert into jtable (jtime, b) values('01/10/2008 18:21', 25)
insert into jtable (jtime, b) values('01/10/2008 18:25', 26)
insert into jtable (jtime, b) values('01/10/2008 18:28', 27)
insert into jtable (jtime, b) values('01/10/2008 18:29', 28)
insert into jtable (jtime, b) values('01/10/2008 18:46', 30)
insert into jtable (jtime, b) values('01/10/2008 18:47', 30)
insert into jtable (jtime, b) values('01/10/2008 18:49', 28)
insert into jtable (jtime, b) values('01/10/2008 18:51', 26)
insert into jtable (jtime, b) values('01/10/2008 18:53', 28)
insert into jtable (jtime, b) values('01/10/2008 18:54', 30)
insert into jtable (jtime, b) values('01/10/2008 18:58', 32)
insert into jtable (jtime, b) values('01/10/2008 18:59', 30)
/**********************************************************/

Open in new window

dportas

Try the following. You can join this result to a Calendar / Time table to populate any missing time values.

SELECT DATEADD(MINUTE, u*15, '20000101') u,
MIN(CASE WHEN rfirst = 1 THEN b END) o,
MAX(b) h,
MIN(b) l,
MIN(CASE WHEN rlast = 1 THEN b END) c
FROM
(SELECT FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) u,
ROW_NUMBER() OVER (PARTITION BY FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) ORDER BY jtime) rfirst,
ROW_NUMBER() OVER (PARTITION BY FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) ORDER BY jtime DESC) rlast,
b
FROM dbo.jtable
WHERE jtime >= '20081001'
AND jtime < '20081002'
) t
GROUP BY u;

Your table doesn't have a key. It would be a good idea to fix that! Always include keys when you post DDL scripts.

jondanger

ASKER

that doesn't return any results for me, if it's something simple, i can't spot it!

Zberteoc

try this

select 
	qtr,
	( select top 1 b from jtable where jtime<=dateadd(mi,15,qtr) order by jtime ) as o,
	max(b) as h,
	min(b) as l,
	( select top 1 b from jtable where jtime<=dateadd(mi,15,qtr) order by jtime desc ) as c
from 
(
	select 
		jtime,
		b,
		case 
			when datepart(mi,jtime)<=15 then dateadd(hh,datediff(hh,0,jtime),0)
			when datepart(mi,jtime)<=30 then dateadd(mi,15,dateadd(hh,datediff(hh,0,jtime),0))
			when datepart(mi,jtime)<=45 then dateadd(mi,30,dateadd(hh,datediff(hh,0,jtime),0))
			else dateadd(mi,45,dateadd(hh,datediff(hh,0,jtime),0))
		end as qtr
	from 
		jtable
) qtrs
group by 
	qtr

Open in new window

Zberteoc

in your example for 18:45 the l value is wrong, should have been 26 instead of 28.

dportas

jondanger, possibly the problem is regional settings related. Using ISO date formats:

/**********************************************************/
insert into jtable (jtime, b) values('2008-10-01T18:01:00', 20)
insert into jtable (jtime, b) values('2008-10-01T18:02:00', 23)
insert into jtable (jtime, b) values('2008-10-01T18:06:00', 22)
insert into jtable (jtime, b) values('2008-10-01T18:08:00', 21)
insert into jtable (jtime, b) values('2008-10-01T18:16:00', 20)
insert into jtable (jtime, b) values('2008-10-01T18:17:00', 20)
insert into jtable (jtime, b) values('2008-10-01T18:18:00', 22)
insert into jtable (jtime, b) values('2008-10-01T18:19:00', 24)
insert into jtable (jtime, b) values('2008-10-01T18:21:00', 25)
insert into jtable (jtime, b) values('2008-10-01T18:25:00', 26)
insert into jtable (jtime, b) values('2008-10-01T18:28:00', 27)
insert into jtable (jtime, b) values('2008-10-01T18:29:00', 28)
insert into jtable (jtime, b) values('2008-10-01T18:46:00', 30)
insert into jtable (jtime, b) values('2008-10-01T18:47:00', 30)
insert into jtable (jtime, b) values('2008-10-01T18:49:00', 28)
insert into jtable (jtime, b) values('2008-10-01T18:51:00', 26)
insert into jtable (jtime, b) values('2008-10-01T18:53:00', 28)
insert into jtable (jtime, b) values('2008-10-01T18:54:00', 30)
insert into jtable (jtime, b) values('2008-10-01T18:58:00', 32)
insert into jtable (jtime, b) values('2008-10-01T18:59:00', 30)
/**********************************************************/

SELECT DATEADD(MINUTE, u*15, '20000101') u,
MIN(CASE WHEN rfirst = 1 THEN b END) o,
MAX(b) h,
MIN(b) l,
MIN(CASE WHEN rlast = 1 THEN b END) c
FROM
(SELECT FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) u,
ROW_NUMBER() OVER (PARTITION BY FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) ORDER BY jtime) rfirst,
ROW_NUMBER() OVER (PARTITION BY FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) ORDER BY jtime DESC) rlast,
b
FROM dbo.jtable
WHERE jtime >= '20081001'
AND jtime < '20081002'
) t
GROUP BY u;

u o h l c
----------------------- ----------- ----------- ----------- -----------
2008-10-01 18:00:00.000 20 23 20 21
2008-10-01 18:15:00.000 20 28 20 28
2008-10-01 18:45:00.000 30 32 26 30
Warning: Null value is eliminated by an aggregate or other SET operation.

(3 row(s) affected)

(Tested on SQL Server 2008 10.0.1600)

Zberteoc

dportas' solution might work for the given data but mine will work for ANY data you might have as it doesn't use hard coded values. Not to mention that is cleaner and simpler and would work in 2k as well as it doesn't use Row_number, which is total unnecessary. No offense dportas. :o)

Zberteoc

for instance ad some rows with other times to the table see which version will work:

insert into jtable (jtime, b) values('02/10/2008 19:01', 20)
insert into jtable (jtime, b) values('02/10/2008 19:18', 30)
insert into jtable (jtime, b) values('03/10/2008 19:40', 40)
insert into jtable (jtime, b) values('04/10/2008 19:57', 50)

Open in new window

jondanger

ASKER

Zberteoc, that's sooo close!

however, o [ select top 1 b from jtable where jtime<=dateadd(mi,15,qtr) order by jtime ]
is always equal to 20, i've tried correcting it but i can't.

qtr                     O       H       L       C 
2008-01-10 18:00:00.000	20	23	20	21
2008-01-10 18:15:00.000	20	28	20	28
2008-01-10 18:45:00.000	20	32	26	30
2008-02-10 19:00:00.000	20	20	20	20
2008-02-10 19:15:00.000	20	30	30	30
2008-03-10 19:30:00.000	20	40	40	40
2008-04-10 19:45:00.000	20	50	50	50

Open in new window

dportas

zberteoc: Am I missing something? Why not just remove the WHERE clause to get the right result for any range of dates?

SELECT DATEADD(MINUTE, u*15, '20000101') u,
MIN(CASE WHEN rfirst = 1 THEN b END) o,
MAX(b) h,
MIN(b) l,
MIN(CASE WHEN rlast = 1 THEN b END) c
FROM
(SELECT FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) u,
ROW_NUMBER() OVER (PARTITION BY FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) ORDER BY jtime) rfirst,
ROW_NUMBER() OVER (PARTITION BY FLOOR(DATEDIFF(MINUTE,'20000101',jtime)/15.00) ORDER BY jtime DESC) rlast,
b
FROM dbo.jtable
) t
GROUP BY u;

PS. This is a global forum, used by people from many different regions. It would help everyone if we stick to non region-specific date formats. The formats: '2008-02-10T19:00:00.000' or '20080115' will work correctly under any regioinal settings so it's best to stick to them.

Zberteoc

for that we need a BETWEEN, there:

select 
	qtr,
	( select top 1 b from jtable where jtime between qtr and dateadd(mi,15,qtr) order by jtime ) as o,
	max(b) as h,
	min(b) as l,
	( select top 1 b from jtable where jtime<=dateadd(mi,15,qtr) order by jtime desc ) as c
from 
(
	select 
		jtime,
		b,
		case 
			when datepart(mi,jtime)<=15 then dateadd(hh,datediff(hh,0,jtime),0)
			when datepart(mi,jtime)<=30 then dateadd(mi,15,dateadd(hh,datediff(hh,0,jtime),0))
			when datepart(mi,jtime)<=45 then dateadd(mi,30,dateadd(hh,datediff(hh,0,jtime),0))
			else dateadd(mi,45,dateadd(hh,datediff(hh,0,jtime),0))
		end as qtr
	from 
		jtable
) qtrs
group by 
	qtr

Open in new window

Zberteoc

dportas,

what's with '20000101'?

Have you used my extra inserts to check?

dportas

'20000101' is just an arbitrary midnight to generate the time intervals from. It works for me with your same data.

jondanger

ASKER

@Zberteoc - that's spot on, thanks. just one last tweak (listed below)

@dportas - very good point about time formats and i shall be sure to revisit 2008-02-10T19:00:00.000, i haven't yet investigated further because zberteoc's solution isn't based on hardcoded values. Thanks anyway, and i still need to put that join in, unless there's another way.

Desired output:
|-------|----|----|----|----|
| u     | o  | h  | l  | c  |
|-------|----|----|----|----|
| 18:00 | 20 | 23 | 20 | 21 |
| 18:15 | 20 | 28 | 20 | 28 |
| 18:30 |    |    |    |    |
| 18:45 | 30 | 32 | 26 | 30 |
|-------|----|----|----|----|
 
 
Current Output
|-------|----|----|----|----|
| u     | o  | h  | l  | c  |
|-------|----|----|----|----|
| 18:00 | 20 | 23 | 20 | 21 |
| 18:15 | 20 | 28 | 20 | 28 |
| 18:45 | 30 | 32 | 26 | 30 |
|-------|----|----|----|----|

Open in new window

Scott Pletcher

I tried to avoid correlated subqueries as they can severely affect performance. I get all the main data in one pass of the table, but another full scan/pass of the table is being used to get the min/max datetime prior to the main processing ... if there is an index on jdate, it won't be a pass of the table, just two quick reads of the index.

My method does require a table of sequential numbers (used to force "missing" times to be included). I'll post cost immediately after this for the one-time create of the seq nums table.

Btw, the # of mins to group on -- default of 15 -- can be changed simply by changing the value of the first variable.

DECLARE @minsToGroupBy INT
SET @minsToGroupBy = 15 --<<-- chg for different minutes groupings

DECLARE @firstTimePeriod INT
DECLARE @lastTimePeriod INT

SELECT @firstTimePeriod = DATEDIFF(MINUTE, 0, MIN(jtime)) / @minsToGroupBy,
@lastTimePeriod = DATEDIFF(MINUTE, 0, MAX(jtime)) / @minsToGroupBy
FROM jtable

SELECT CONVERT(VARCHAR(16), DATEADD(MINUTE, (@firstTimePeriod + seqNum) * @minsToGroupBy, 0), 120) + ' ' AS u,
COALESCE(STR(o % CAST(10000000000 AS BIGINT), 10), '') AS o,
COALESCE(STR(h, 10), '') AS h,
COALESCE(STR(l, 10), '') AS l,
COALESCE(STR(c % CAST(10000000000 AS BIGINT), 10), '') AS c
FROM (
SELECT s.seqNum,
MIN(CAST(DATEDIFF(MINUTE, 0, jtime) AS BIGINT) * CAST(10000000000 AS BIGINT) + b) AS o,
MAX(b) AS h,
MIN(b) AS l,
MAX(CAST(DATEDIFF(MINUTE, 0, jtime) AS BIGINT) * CAST(10000000000 AS BIGINT) + b) AS c
FROM seqNums s
LEFT OUTER JOIN jtable ON s.seqNum = DATEDIFF(MINUTE, 0, jtime) / @minsToGroupBy - @firstTimePeriod
WHERE s.seqNum <= @lastTimePeriod - @firstTimePeriod
GROUP BY s.seqNum
) AS derived
ORDER BY 1

SOLUTION

Scott Pletcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Zberteoc

jondanger,

i hoped you wouldn't ask that. :)

Zberteoc

However, done, nice and smoothly ;o)

select 
	allqtrs.qtr,
	isnull(( select top 1 b from jtable where jtime between allqtrs.qtr and dateadd(mi,15,allqtrs.qtr) order by jtime ),0) as o,
	isnull(max(b),0) as h,
	isnull(min(b),0) as l,
	isnull(( select top 1 b from jtable where jtime between allqtrs.qtr and dateadd(mi,15,allqtrs.qtr) order by jtime desc ),0) as c
from 
	(
		select distinct 
			dateadd(mi,mins,dateadd(hh,datediff(hh,0,jtime),0)) qtr
		from 
			jtable j
			cross join 
			(
				select 0 mins union
				select 15 mins union
				select 30 mins union
				select 45 mins 
			) mins
	) allqtrs
	left join 
	(
		select 
			jtime,
			b,
			case 
				when datepart(mi,jtime)<=15 then dateadd(hh,datediff(hh,0,jtime),0)
				when datepart(mi,jtime)<=30 then dateadd(mi,15,dateadd(hh,datediff(hh,0,jtime),0))
				when datepart(mi,jtime)<=45 then dateadd(mi,30,dateadd(hh,datediff(hh,0,jtime),0))
				else dateadd(mi,45,dateadd(hh,datediff(hh,0,jtime),0))
			end as qtr
		from 
			jtable
	) qtrs
		on qtrs.qtr=allqtrs.qtr
group by 
	allqtrs.qtr
order by 
	allqtrs.qtr

Open in new window

SOLUTION

Zberteoc

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Scott Pletcher

>> Having an index on jtime is not to helpfull as it is wrapped in datetime functions anyway. <<

The index will help my first query tremendously, the one that just finds the min and max jtime.

All the rest of the main table retrieval I do in a single query (with no subqueries).

jondanger

ASKER

Thank you all. Both Zberteoc's and Scott's solutions have far exceeded my expecations. I now feel silly for doubting SQL would be able to handle my data in this manner. I've invested a lot of time investigating something that is, now, potentially unnecessary. I'm yet to test on the full data table (c. 3.4m records).

The tests were done on 6000 records, and run 10 times each.

@Zberteoc - Thanks, i had no idea how to do performance-checking in SQL. Like Scott's, I'm still getting an odd result, but it's close. Scott's interval feature is an enourmous help.
Pre-Index: 178ms
Post-Index: 61ms

@Scott - Absolutely incredible. Are you psychic? I was adding this in further down the line. I have another similar hurdle but that's for another question...
Pre-Index: 99ms
Post-Index: 84ms

To make sure the test was fair, all i did to create the index was right click in the table designer and 'set primary key'. Is this correct? Or shall i read some more into indexing.

Unfortunately, neither query is completely accurate when using the larger table. My original example was flawed and i apologise.

I'll make another post for each solution, and attach the spurious results on each, then try and explain as best i can what i think is happening.

jondanger

ASKER

@Zberteoc

There's a ghost row coming up at the end of the recordset.
It generates zeroes instead of Nulls where there's no b value(s) for the interval for that row.
On records where o,h,c, l should all be null, there's a rogue H & L. I can't workout where they are coming from.

Zberteoc-Bilderman-2008.10.01-23.png

jondanger

ASKER

@Scott

It appears the query is a minute fast.
In my example data, i should have had a row where jtime = a possible interval. (The majority of H and L values appear fine, as you'd expect. as there's only a 1 in 15 chance the max or min value is in the first minute of the interval.)
I can't see why these huge numbers are appearing, i shall have another look tomorrow and let you know if i find anything.

scott-Bilderman-2008.10.01-23-35.png

Scott Pletcher

Oops, negative values would do that, sorry, didn't think about negatives.

I can re-work the query, but it will be an hour or so.

jondanger

ASKER

thansk Scott,

Did you see what i wrote about indexes? I'm dubious that what i did was what you meant, way too simple!

ASKER CERTIFIED SOLUTION

Zberteoc

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Zberteoc

About indexes,

you can create them by open the table in design mode (right click) and the click on Manage Indexes and Keys or right click one one of the columns and then Indexes/Keys. If you click on the button with one key that will make the selected column the primary key, which then will show a key icon beside it.

SOLUTION

Scott Pletcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

jondanger

ASKER

Both queries are now working to spec! thanks for your help guys.

@Zberteoc - Thank you so much. Your query is blisteringly fast, averaging 450ms to return 3640 records from 64,000 rows of data. After i added select Top 40, that cuts down to 240ms.

However, in practice, Scott's idea for a dynamic interval grouping using @MinsToGroupBy is too-good-to-miss and i shall be persevering with his query. I added another dynamic variable, @barsToFetch (see below), and when set to 40, his query runs in 295ms. The 55ms overhead is well worth it. Incidentally, when querying for all intervals, Scott's query takes over a minute, in practice, the maximum i would ever need is 100 results.

Thanks though, it's clearly an excellent solution! I hope you're okay with the point split.

@Scott - Absolutely superb!!!
I've added @barsToFetch, which is now used to calculate @firstTimePeriod, effectively a select TOP @x

I still have an issue, but it exists because i've not been thorough enough when asking my question.

I'm about to start a new question, based on your code, and will post a link here when i I have.

Again, thanks for your help, i've learnt loads and I trust you're ok with the split.

jondanger

ASKER

thank you both!