Link to home
Start Free TrialLog in
Avatar of MISServices
MISServices

asked on

Can you think of a way not to use a cursor here?

Here's a good logic challenge:

I have to come up with figures for room occupancy based on timetables data.

For most rooms this is easy. No two events can be timetabled at once so you just add up the all booking durations and subtract that from the available hours.

It gets tricky when it comes to the rooms where we do allow concurrent bookings. Unfortunately it's extra important these are included in the report because they are usually large, expensive teaching workshops where optimal usage is particularly relevant.

I am sure code for this has been written many times before.
As far as I can see what I am essentially trying to do is just find which blocks stretches of time have no activity in them. The most distilled procedural description of this is as follows:

For each day in each room in each week sort all bookings by start time.
Step through checking on each row how much time elapsed between the start time and the latest end time so far

It's that "so far" bit that has lost me the hope of doing this without a cursor. If bookings were of uniform length I could rely on the start time sort effectively sorting the end times and therefore I could always just compare to the previous row - something which can be done set-based.
The fact is that the the big room booking which reaches way out until the end of the day may also have been the very first booking of the day so comparing Booking Four starting at midday with the previous Booking Three which ended at 11:30 will count half an hour unused time which isn't the case.

The only alternative to this cursor is a very overcomplicated set of operations comparing All with All, to batch together clusters of bookings then identify the outer limits of these clusters. Finding the overlaps is easy but whittling down and grouping and categorising the types of overlaps, using different approaches for instance if 3 events are concurrent to when 3 events are in a cluster but one overlaps with the other 2 but those 2 don't overlap with each other.

However I might have missed a trick and be wrong about the necessary unwieldiness of the above. If there's some good catch-all set-based logic somebody has used for something similar your help would be greatly appreciated. I've not used cursors before and I don't want to slog through learning it just to find going through a 69000 row temp table (that's just for the rooms with concurrent booking) is just too stupidly slow even for an overnight procedure.

Many Thanks,

Adam
ASKER CERTIFIED SOLUTION
Avatar of davehilditch
davehilditch

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of davehilditch
davehilditch

Incidentally, cursors aren't all bad - I know people get religious about them, but for 69,000 rows if you use a fast_forward read_only cursor it should be fast enough to complete in a minute or two.  If the code ends up easier to use/read/maintain then it might be the better option.
Avatar of Anthony Perkins
It really depends on your tables; without seeing your tables it is impossible to know for sure.  If they are well designed (by this I mean fully normalized) than a CURSOR should not be necessary, if not it may be your only hope.  
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
so to explain my code above, the CTE creates 'TIMERANGE' which is effectively one record per minute of the range from start to end, then the query uses an 'exists' to find the 'minutes' where there was an occupancy (so multiple occupancy is only counted once, but zero occupancy is not counted).  Then the count of all the minutes (exclusive of the end, if you ended occupancy at 10:45 then 10:45 does not count) -  Is the duration in minutes where there was occupancy.

It worked in my test, you would tweak the exists clause to select occupancy for different rooms etc.  My table is very simple, having only occupancy start and end times - for illustration
Avatar of MISServices

ASKER

Thanks guys,

acperkins, I'm probably missing what you're saying but I can't see how or why normalisation would lead us to a data structure that would package up and put neat labels on event clusters (or non-events) according to one particular definition (room timetable rather than teacher or student or course). Please do outline which structure would help with this if I've missed the point because we might be able to use it.

DaveHilditch, I kicked myself the second I saw your solution. Always the obvious stuff, eh?  That table would need about 750,000 rows but that's probably not too much to run quick.

Deighton, your solution appears beautiful and robust (and using SQL functionality I should have know more about) but that need for a tweak on the WHERE clause would be 22,000 tweaks for each room- day of the year. I've no doubt you could pull out the bag some way of doing that efficiently but 15 minute slots are enough resolution for me so I'll see how performant Dave's mere mortal approach is and get back to you.

Thanks!
Not that I would dare call you a mere mortal Dave. That'd be me. Wouldn't want striking down
lol, how are you calculating your 750,000 row expectation?  Post your final bit of code back here and i'll recommend indexes if you like to make it fast.
if you had another field called 'RoomId', then the following groups by all rooms

DECLARE @PERIOD_START as DATETIME;
DECLARE @PERIOD_END as DATETIME;

SET @PERIOD_START = '01-01-2010';
SET @PERIOD_END = '01-10-2010 23:59';


WITH TIMERANGE AS 
	(
		SELECT @PERIOD_START AS Theminute 
		UNION ALL
		SELECT dateadd(minute,1,Theminute) FROM TIMERANGE WHERE Theminute<@PERIOD_END
	)
	SELECT ROOMID, 
                     (SELECT COUNT(*) FROM TIMERANGE
                          WHERE EXISTS(SELECT NULL FROM BOOKINGS B WHERE TIMERANGE.THEMINUTE >= B.StartTime AND TIMERANGE.THEMINUTE < B.EndTime AND B.RoomId = B1.Roomid)) as MinutesOccupancy 
    FROM BOOKINGS B1 group by B1.roomid
	OPTION (MAXRECURSION 0);

Open in new window

BTW - it could work in 15 minute slots, how is your data defined?  Did you use time, or have you given 15 minute slots their own id?  
Looks like Dave's approach runs in under 30 seconds.

I'll split the points because I took serious inspiration from deighton's code in creating the base timerange table using an int instead of datetime.

Thanks loads!
To answer those other points I didn't refresh to see a moment ago,

deighton, I have little doubt your approach modified to 15 min slots and tweaked to group for each room on each teaching day of the year would be just as quick, definitely more compact and now I think of it it's probably as good a time as any to get my head round how that WITH statement works rather than WHILE loop (yes I am a novice). My only concern is that the more advanced code I use the less likely it can be unpicked by a colleague later down the line. Maybe a poor excuse but I will try to get round to compacting it to your form.

The data isn't defined in 15 minute slots but the number of occasions in which something is not timetabled on the 15 minute mark is truly negligible.

Dave, the reason for 750,000 rows is that it's 32 15 minute slots (8 hour day) for each affected room for each teaching day of the year.


DECLARE  @15MinTimeRange table (StartMin int) 
    
DECLARE @StartMin int

SET @StartMin = 9 /*o'clock*/ * 60 /*minutes*/

WHILE @StartMin < 17 /*o'clock*/ * 60 /*minutes*/

BEGIN
	INSERT INTO @15MinTimeRange
	SELECT @StartMin
	SET @StartMin = @StartMin + 15
END


--------------
DROP TABLE #AvailSlots
SELECT DISTINCT FullYrWeekNumber, DoW, r_ID, StartMin

INTO #AvailSlots
FROM #SampleActs A
CROSS JOIN @15MinTimeRange Ran
---------------

SELECT
r_id,
(select r_reference from [UNITE-SQL\Ulive].Ulive.dbo.CAPD_room r with (nolock) where inn.r_id = r.r_id) RoomName, 
1 - ((CAST(SUM(NullCount) AS FLOAT) * 15) / (COUNT(*) * (8 /*hours*/ * 60 /*minutes*/))) Perc_Occupied

FROM

	(SELECT Av.FullYrWeekNumber, 
	Av.DoW, 
	Av.r_ID,
	Count(*) NullCount


	FROM #AvailSlots Av
	left join #SampleActs Act on

	av.FullYrWeekNumber = Act.fullYrWeekNumber
	and av.DoW = Act.DoW
	and av.r_ID = act.r_id
	and Act.Start <= Av.StartMin 
	and Act.[End] > Av.StartMin

	WHERE Act.R_id is null

	GROUP BY av.FullYrWeekNumber, av.DoW, av.r_ID) INN

GROUP BY R_ID

Open in new window



I've realised that code's not quite right in that it shows the occupancy of all days in rooms that at least have something activity. It should include all the blank space of the days without a single activity but that's no big deal for me to worry about and I don't want to delay replying any further.

Don't worry about indexes because even if it makes it 10 times faster that's 9 extra seconds our server really doesn't mind overnight (possibly a bad attitude but efficiency is a real luxury with our workload so I'm picking my battles).

This has been my first EE Q and considering how good you've been I might come back with another related question I'd thought I was just going to compromise on.

Ta!