<

ranges, gaps & overlaps (for number and date ranges)

Published on
27,438 Points
17,138 Views
8 Endorsements
Last Modified:
Awarded
Editor's Choice
Community Pick

0. Introduction

I have seen many questions on EE and elsewhere, asking about how to find either
gaps in lists of numbers (id field, usually)
ranges of values or dates
overlapping date ranges
combined date ranges

I thought it would be a good idea to share how I approached this in a pure SQL manner, though a word of warning: it might not be the most efficient code in your scenario.

The code works in sql server 2005+, but should also work in Oracle, replacing the table variable with a real table.

1. Data Samples

I will create 2 tables, one with numerical data (@t), and one with date ranges (@d).
For @t, we assume that the field has distinct data, but it should not change the outcome.
For @d, an additional column has been added to ensure that we can identify the rows.


-- First the number based series

declare @t table ( data int )
insert into @t (data) values ( 3 )
insert into @t (data) values ( 4 )
insert into @t (data) values ( 7 )
insert into @t (data) values ( 8 )
insert into @t (data) values ( 9 )
insert into @t (data) values ( 13 )
insert into @t (data) values ( 15 )

select *  from @t order by data

/* gives the result:
data
-----------
3
4
7
8
9
13
15
*/

-- Now the date based series

declare @d table ( start_date datetime , end_date datetime , id int identity)
declare @x datetime
set @x = convert(datetime, convert(varchar(10), getdate(), 120), 120)
insert into @d (start_date, end_date ) select @x, @x 

insert into @d (start_date, end_date ) select @x + 1, @x + 3

insert into @d (start_date, end_date ) select @x + 5, @x + 8
insert into @d (start_date, end_date ) select @x + 7, @x + 10
insert into @d (start_date, end_date ) select @x + 10, @x + 11

insert into @d (start_date, end_date ) select @x + 15, @x + 17
insert into @d (start_date, end_date ) select @x + 14, @x + 16

select * 
  from @d 
 order by start_date, end_date

/* gives the result :
start_date              end_date                id
----------------------- ----------------------- -----------
2010-10-21 00:00:00.000 2010-10-21 00:00:00.000 1
2010-10-22 00:00:00.000 2010-10-24 00:00:00.000 2
2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
*/

Open in new window


2. Find gaps

The approach to find the gaps (as ranges), is to find the value, where the next value is not present. So, we number the rows by order of the data value (ROW_NUMBER), and join to the previous row number. IF the data of the previous row number is indeed the current's data - 1, it's not a gap. If it is different, then we know where the gap starts (previous data + 1), and where it ends (current data -1).

;with data as ( 
  select t.*
       , row_number() over ( order by data desc ) rx
  from @t t
)
select tc.data + 1 start_gap
        , tn.data - 1 end_gap
        , row_number() over (order by tc.data asc) rn
from data tc
left join data tn on tn.rx = tc.rx -1
where ( tn.data <> tc.data + 1 and tc.rx > 1  )
order by tc.rx desc


/*
start_gap   end_gap     rn
----------- ----------- --------------------
5              6              1
10            12            2
14            14            3
*/

Open in new window


For the Date Ranges, the approach is like this: we need to find the start and end ranges of the "merged" date ranges. For the "End Date" of those "merged ranges", we need to find the records where there is no other record which starts before the range's end, and ends afterwords. This is done in the "with d1".
For the "Start Ranges", we need to find the records where there is no other record which starts before the range's start, and ends afterwords. This is done in the "with d2".

Both d1 and d2 are row numbered (rn) in date order, and will then be joined together in the final select. Because the first range will have row number 1 for both start and end date, we will need to join the end date of range 1 with the start date of range 2 to give the gap between those 2 ranges.

; with d1 as (
select m.end_date
     , row_number() over (order by m.end_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.end_date
                   and o.id <> m.id 
              )
  group by m.end_date
)
, d2 as (
select m.start_date
     , row_number() over (order by m.start_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.start_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )
  group by m.start_date
)
select d1.end_date gap_start, d2.start_date gap_end, d1.rn
  from d1
  join d2
    on d1.rn = d2.rn - 1 

/*
gap_start               gap_end                 rn
----------------------- ----------------------- --------------------
2010-10-21 00:00:00.000 2010-10-22 00:00:00.000 1
2010-10-24 00:00:00.000 2010-10-26 00:00:00.000 2
2010-11-01 00:00:00.000 2010-11-04 00:00:00.000 3
*/

Open in new window



3. Merge Ranges

To find the ranges is basically the "opposite" of finding the gaps. For number ranges, the approach is hence the same as in finding the gaps.
;with data as ( 
  select t.*
       , row_number() over ( order by data asc ) rn
       , row_number() over ( order by data desc ) rx
  from @t t
)
, data2 as (
select tc.data, tc.rn, tc.rx, tp.data prev_data 
   , row_number() over (order by tc.rn) r2   
from data tc
  left join data tp
    on tp.rn = tc.rn - 1
 where ( tp.rn is null or tp.data <> tc.data - 1 )
)
select tc.data range_start
  , isnull(tn.prev_data, (select max(data) from data )) range_end
  , tc.r2 rn
  from data2 tc
  left join data2 tn
    on tn.r2 = tc.r2 + 1
order by tc.rn

/*
range_start range_end   rn
----------- ----------- --------------------
3            4            1
7            9            2
13          13          3
15          15          4
*/

Open in new window


To merge the date ranges, it's a bit trickier, but again, basically using the "gaps" technique, because the merged ranges are between the gaps, plus the first and last range identified by the min(start_date) and max(end_date).
; with d1 as (
select m.end_date
     , row_number() over (order by m.end_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.end_date
                   and o.id <> m.id 
              )
  group by m.end_date
)
, d2 as (
select m.start_date
     , row_number() over (order by m.start_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.start_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )
  group by m.start_date
)
, d3 as (
select d1.end_date gap_start, d2.start_date gap_end, d1.rn
  from d1
  join d2
    on d1.rn = d2.rn - 1 
 union all
select null, min(start_date), 0 from @d
 union all
select max(end_date), null, (select max(rn) from d2) from @d
)
select dc.rn + 1 rn, dc.gap_end range_start, dn.gap_start range_end
  from d3 dc
  join d3 dn
    on dn.rn = dc.rn + 1
order by dn.rn

/*
rn                   range_start             range_end
-------------------- ----------------------- -----------------------
1                    2010-10-21 00:00:00.000 2010-10-21 00:00:00.000
2                    2010-10-22 00:00:00.000 2010-10-24 00:00:00.000
3                    2010-10-26 00:00:00.000 2010-11-01 00:00:00.000
4                    2010-11-04 00:00:00.000 2010-11-07 00:00:00.000
*/

Open in new window



4. Overlapping Ranges

This is to find any records which overlap with some date range of any other record. This technique has been used in the above steps already, but often needed apart, which is why post that part anyhow.

The Algorithm is simple: comparing the current record with another record, if the other date range's end is after the current record's end AND the other's start before the current's end data, the 2 records are overlapping.

You might review this logic by drawing the different situations, and find that the 2 conditions as shown are sufficient to handle all the cases. It's important to note the condition to check that the ID values are NOT the same for the 2 records. This is needed here (and obviously in the above codes also), otherwise all records would logically overlap with at least themselves.
select * 
  from @d m
 where exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )

/*
start_date              end_date                id
----------------------- ----------------------- -----------
2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
*/

Open in new window


5. Performance

I did not test the code for performance explicitly. What is important is that your fields (data, start_date, end_date, id) are indexed, otherwise the ROW_NUMBER() function and hence the whole code will take ages to complete such queries.

It is to note that any alternative will need to order the records, so the "ordering" (which ROW_NUMBER does) will have to happen anyhow, somewhere. Either by the database engine, or by some application/scripting code.

Because we use Table Variables, the code has to be run as a complete set. That code is attached and ready to run in SQL Server.
 Angeliii-Gaps-and-Ranges-Code.tx.sql
8
Ask questions about what you read
If you have a question about something within an article, you can receive help directly from the article author. Experts Exchange article authors are available to answer questions and further the discussion.
Get 7 days free