Link to home
Start Free TrialLog in
Avatar of dannygonzalez09
dannygonzalez09

asked on

Roll up data using the time difference between contiguous rows

Hi

I am trying to find the time difference between contiguous rows and if its with in a certain interval roll them up as 1 outage

In the below example RowId time diff between Endtime of RowId -135 and StartTime of RowId -136 is greater than 10 mins and same between RowId - 136 & 137.. so they'll be treated as separate outages but time difference between the contiguous rows from 137 upto 157 is with 10 mins, in this case roll it up ...the result should be as below

There are may other columns involved... i'm hoping that i'll be able to get the rest of the issue resolved if i can roll these up together ...

So far, I've tried to use CTE with RowNum function to join the consecutive rows and get the time diff but it isn't working out..

For Ex:
RowId      Id      StartTime      EndTime
135      8526      9/1/2013 17:35:07      9/1/2013 17:38:36
136      8526      9/1/2013 18:47:18      9/1/2013 18:49:07
137      8526      9/2/2013 14:08:12      9/2/2013 14:08:15
138      8526      9/2/2013 14:10:33      9/2/2013 14:10:44
139      8526      9/2/2013 14:12:25      9/2/2013 14:12:34
140      8526      9/2/2013 14:12:38      9/2/2013 14:13:00
141      8526      9/2/2013 14:13:11      9/2/2013 14:13:12
142      8526      9/2/2013 14:13:13      9/2/2013 14:13:15
143      8526      9/2/2013 14:13:31      9/2/2013 14:14:37
144      8526      9/2/2013 14:14:44      9/2/2013 14:15:14
145      8526      9/2/2013 14:15:18      9/2/2013 14:15:30
146      8526      9/2/2013 14:15:36      9/2/2013 14:15:43
147      8526      9/2/2013 14:15:48      9/2/2013 14:16:02
148      8526      9/2/2013 14:16:06      9/2/2013 14:16:15
149      8526      9/2/2013 14:18:53      9/2/2013 14:18:55
150      8526      9/2/2013 14:19:04      9/2/2013 14:19:07
151      8526      9/2/2013 14:19:15      9/2/2013 14:19:20
152      8526      9/2/2013 14:19:24      9/2/2013 14:19:25
153      8526      9/2/2013 14:19:36      9/2/2013 14:19:49
154      8526      9/2/2013 14:19:54      9/2/2013 14:19:58
155      8526      9/2/2013 14:20:02      9/2/2013 14:20:21
156      8526      9/2/2013 14:20:33      9/2/2013 14:20:42
157      8526      9/2/2013 14:27:13      9/2/2013 14:27:22
158      8526      9/2/2013 16:34:10      9/2/2013 16:35:23

Output:

135      8526      9/1/2013 17:35:07      9/1/2013 17:38:36
136      8526      9/1/2013 18:47:18      9/1/2013 18:49:07
137      8526      9/2/2013 14:08:12      9/2/2013 14:27:22
158      8526      9/2/2013 16:34:10      9/2/2013 16:35:23
Avatar of Jim Horn
Jim Horn
Flag of United States of America image

Here's an article i wrote on T-SQL:  Identifying bad dates in a time series.  There's a screen shot about half way down that demonstrates a series of data, with a time_variance column that displays the difference in days.

Feel free to copy this code into your SSMS, run to verify it works, then change the DATEDIFF line so that it does minutes instead of days.  Then, edit the final SELECT so that it shows only rows where the time_variance > 10.

Enjoy.
Avatar of dannygonzalez09
dannygonzalez09

ASKER

I'm not sure if you misunderstood my question.... I want to get all the values but combine only those where the time difference between end time and start time of the contiguous row is less than 10 mins
Define 'combine'.   If I'm reading this correctly, since we're not assigning new ID values based on whether time_variance > 10 or not, the solution should still work.
I think you're going to have to use a cursor to accomplish this if the cte method isn't working.  Just cycle through the records ordered appropriately.  Compare the current record with the previous (save the pertinent values to variables) and either discard or insert the current record in a temp table.
well, i was doing the same with CTE but wasn't able to get the correct results... I'm sure i am missing something in the logic... Can you help me with the code

Thanks
Can you post some schema information or the cte code you were trying?
Here's the code i have so far.... the output is incorrect as its not getting the correct values for the 1st and the last time intervals .. the code gets the Min and Max RowId from which i'm to get the time stamps in the next step

WITH CTE AS
 (
	SELECT *,(Row_Number() OVER (PARTITION BY ModelKey ORDER BY StartTime)) as RowNum
	--, CASE WHEN TotalSeconds > 600 THEN 0 ELSE 1 END as SameOutage
	FROM SampleData
	WHERE ModelKey = 8526 
 )
 
,CTERowNum AS
( 
SELECT X.*, ABS(DATEDIFF(SS,EndTime,NxtStartTime)) as Diff
, CASE WHEN ABS(DATEDIFF(SS,EndTime,NxtStartTime)) < 600 THEN 1 ELSE 0 END as SameOutage
FROM
	(
SELECT C.RowId,C.ModelKey, C.StartTime, C.EndTime,C.RowNum ,C2.StartTime as NxtStartTime, C2.EndTime as NxtEndTime,C2.RowNum as NxtRowNum
FROM CTE C
LEFT JOIN CTE C2 
ON C.ModelKey = C2.ModelKey AND C.RowNum = C2.RowNum-1 and C.EndTime < C2.StartTime
	)X
)

SELECT MIN(RowId) as Id,ModelKey, MIN(StartTime) as StartTime, Min(TotalSec),
CASE WHEN MIN(TotalSec) > 600 THEN MAX(EndTime) ELSE MAX(NxtEndTime) END as NxtEnd, MIN(MinRow) as MinRow, MAX(MaxRow) as MaxRow
FROM
(
SELECT RowId,ModelKey,StartTime,EndTime,NxtStartTime,NxtEndTime, ABS(DATEDIFF(SS,NxtStartTime,EndTime)) as TotalSec
, (select TOP 1 RowId FROM CTERowNum C2 WHERE C2.ModelKey = C.ModelKey and isnull(C2.SameOutage,0)<>1 and C2.RowNum < C.RowNum Order by RowId desc) as MinRow
, (select TOP 1 RowId FROM CTERowNum C2 WHERE C2.ModelKey = C.ModelKey and isnull(C2.SameOutage,0)<>1 and C2.RowNum > C.RowNum Order by RowId asc)-1 as MaxRow
FROM CTERowNum C
)X
GROUP BY ModelKey,MinRow,MaxRow
Order by StartTime

Open in new window

Sample.xlsx
I'm working on a cursor solution but it's going to take me a while to get it written and tested.
dannygonzalez09 - Let me know if you intend to read my article or not, as it uses CTE's and looks similar to what you have posted above.
Thanks BriCrowe

@Jimhorn - Yes, i looked at your article as soon as you posted it.... it basically does the same thing that i was working on upto an extent, but i couldn't make mine work yet and i am stuck..

Sorry, but it din't help me much
Okay try this...change the tablename "myEvent" to your tablename

Let me know if you have any problems.  I tested it against a sample set with a second ID value.

SET NOCOUNT ON;

DECLARE @ID0		INT,
	@StartTime0		DATETIME,
	@EndTime0		DATETIME,
	@IDPrev			INT			= NULL,
	@StartTimePrev	DATETIME	= NULL,
	@EndTimePrev	DATETIME	= NULL,
	@IDCurr			INT			= NULL,
	@StartTimeCurr	DATETIME	= NULL,
	@EndTimeCurr	DATETIME	= NULL,
	@Interval		INT,		--Interval Threshold in seconds
	@RowCount		INT
	
DECLARE @CondensedEvents TABLE
(
	ID				INT,
	StartTime		DATETIME,
	EndTime			DATETIME
	--Include any other columns here
)

SELECT @Interval = 600			--10 minutes

DECLARE crsEvent CURSOR FOR
SELECT ID, StartTime, EndTime
FROM myEvent
ORDER BY ID, EndTime

OPEN crsEvent

FETCH NEXT FROM crsEvent
INTO @ID0, @StartTime0, @EndTime0

WHILE @@FETCH_STATUS = 0
BEGIN
	IF @IDCurr IS NULL
	BEGIN
		PRINT 'No Current Record'
		FETCH NEXT FROM crsEvent
		INTO @IDCurr, @StartTimeCurr, @EndTimeCurr
		
	END
	
	PRINT '0 - ID[' + ISNULL(CAST(@ID0 AS VARCHAR), '') + '] StartTime[' + ISNULL(CONVERT(VARCHAR, @StartTime0, 120), '') + '] EndTime[' + ISNULL(CONVERT(VARCHAR, @EndTime0, 120), '') + ']'
	PRINT 'P - ID[' + ISNULL(CAST(@IDPrev AS VARCHAR), '') + '] StartTime[' + ISNULL(CONVERT(VARCHAR, @StartTimePrev, 120), '') + '] EndTime[' + ISNULL(CONVERT(VARCHAR, @EndTimePrev, 120), '') + ']'
	PRINT 'C - ID[' + ISNULL(CAST(@IDCurr AS VARCHAR), '') + '] StartTime[' + ISNULL(CONVERT(VARCHAR, @StartTimeCurr, 120), '') + '] EndTime[' + ISNULL(CONVERT(VARCHAR, @EndTimeCurr, 120), '') + ']'

	IF @ID0 = @IDCurr
	BEGIN
		--PRINT 'Same ID Value: ' + CAST(@IDCurr AS VARCHAR)
		
		IF DATEDIFF(SECOND, @EndTime0, @EndTimeCurr) > @Interval
		BEGIN
			PRINT 'Interval Exceeded[' + CAST(DATEDIFF(SECOND, @EndTime0, @EndTimeCurr) AS VARCHAR) + ']'
			--Record the condensed record
			PRINT 'Inserting record matching: ID[' + CAST(ISNULL(@IDPrev, @IDCurr) AS VARCHAR) +
				'] StartTime[' + CONVERT(VARCHAR, ISNULL(@StartTimePrev, @StartTimeCurr), 120) +
				'] EndTime[' + CONVERT(VARCHAR, ISNULL(@EndTimePrev, @EndTimeCurr), 120) + ']'
			INSERT INTO @CondensedEvents
			(
				ID,
				StartTime,
				EndTime
			)
			SELECT ID, StartTime, EndTime --Other columns
			FROM myEvent
			WHERE ID = ISNULL(@IDPrev, @IDCurr)
				AND StartTime = ISNULL(@StartTimePrev, @StartTimeCurr)
				AND EndTime = ISNULL(@EndTimePrev, @EndTimeCurr)

			SET @RowCount = @@ROWCOUNT
			IF @RowCount = 0		--This shouldn't happen but I had a bug I had to track down
			BEGIN
				PRINT 'NO MATCHING RECORD FOUND'
			END
			
			SELECT @ID0 = @IDCurr,
				@StartTime0 = @StartTimeCurr,
				@EndTime0 = @EndTimeCurr
		END
		ELSE	--Still within range of interval
		BEGIN
			PRINT 'Interval NOT Exceeded'
			SELECT @IDPrev = @IDCurr,
				@StartTimePrev = @StartTimeCurr,
				@EndTimePrev = @EndTimeCurr
		END
	END
	ELSE	--Reset everything for the new ID value
	BEGIN
		SELECT @ID0 = @IDCurr,
			@StartTime0 = @StartTimeCurr,
			@EndTime0 = @EndTimeCurr,
			@IDPrev = NULL, @StartTimePrev = NULL, @EndTimePrev = NULL,
			@IDCurr = NULL, @StartTimeCurr = NULL, @EndTimeCurr = NULL
	END
	
	--Get the next record
	FETCH NEXT FROM crsEvent
	INTO @IDCurr, @StartTimeCurr, @EndTimeCurr
	
	PRINT ''
END
CLOSE crsEvent
DEALLOCATE crsEvent

SELECT * FROM myEvent WHERE ID = 8526
SELECT * FROM @CondensedEvents

Open in new window

Thanks BriCrowe

I ran the script but looks like its not getting what i need (probably because i messed up something)

I've found another useful link and seems to almost get the result set i need... i still need to change it to calculate the time difference using the Start and Endtime (of the next row) instead of using start times

http://sqlmag.com/t-sql/grouping-web-site-hits-sessions
I see what I was doing wrong...I'll make the adjustments tomorrow.
ASKER CERTIFIED SOLUTION
Avatar of Brian Crowe
Brian Crowe
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks BriCrowe

Perfect, exactly what i needed...

I was also able to modify the script (from the link i posted yesterday) to get all the output i needed, currently testing its performance which so far has been good

I'll compare the performance on both and use the best, will post a comment soon if i need more help on your SQL

Thanks again
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you BriCrowe and Paul
No problem, thanks for the assist. Cheers, Paul.
BriCrowe, I've been using your script which worked like a charm but new data is messing up the output...can you please help
 
I've opened another question here
https://www.experts-exchange.com/questions/28280456/Roll-up-data-using-the-time-difference-between-contiguous-rows-2.html

Thanks