<

ranges, gaps & overlaps (for number and date ranges)

Published on
25,286 Points
14,986 Views
8 Endorsements
Last Modified:
Awarded
Editor's Choice
Community Pick

0. Introduction

I have seen many questions on EE and elsewhere, asking about how to find either
gaps in lists of numbers (id field, usually)
ranges of values or dates
overlapping date ranges
combined date ranges

I thought it would be a good idea to share how I approached this in a pure SQL manner, though a word of warning: it might not be the most efficient code in your scenario.

The code works in sql server 2005+, but should also work in Oracle, replacing the table variable with a real table.

1. Data Samples

I will create 2 tables, one with numerical data (@t), and one with date ranges (@d).
For @t, we assume that the field has distinct data, but it should not change the outcome.
For @d, an additional column has been added to ensure that we can identify the rows.


-- First the number based series

declare @t table ( data int )
insert into @t (data) values ( 3 )
insert into @t (data) values ( 4 )
insert into @t (data) values ( 7 )
insert into @t (data) values ( 8 )
insert into @t (data) values ( 9 )
insert into @t (data) values ( 13 )
insert into @t (data) values ( 15 )

select *  from @t order by data

/* gives the result:
data
-----------
3
4
7
8
9
13
15
*/

-- Now the date based series

declare @d table ( start_date datetime , end_date datetime , id int identity)
declare @x datetime
set @x = convert(datetime, convert(varchar(10), getdate(), 120), 120)
insert into @d (start_date, end_date ) select @x, @x 

insert into @d (start_date, end_date ) select @x + 1, @x + 3

insert into @d (start_date, end_date ) select @x + 5, @x + 8
insert into @d (start_date, end_date ) select @x + 7, @x + 10
insert into @d (start_date, end_date ) select @x + 10, @x + 11

insert into @d (start_date, end_date ) select @x + 15, @x + 17
insert into @d (start_date, end_date ) select @x + 14, @x + 16

select * 
  from @d 
 order by start_date, end_date

/* gives the result :
start_date              end_date                id
----------------------- ----------------------- -----------
2010-10-21 00:00:00.000 2010-10-21 00:00:00.000 1
2010-10-22 00:00:00.000 2010-10-24 00:00:00.000 2
2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
*/

Open in new window


2. Find gaps

The approach to find the gaps (as ranges), is to find the value, where the next value is not present. So, we number the rows by order of the data value (ROW_NUMBER), and join to the previous row number. IF the data of the previous row number is indeed the current's data - 1, it's not a gap. If it is different, then we know where the gap starts (previous data + 1), and where it ends (current data -1).

;with data as ( 
  select t.*
       , row_number() over ( order by data desc ) rx
  from @t t
)
select tc.data + 1 start_gap
        , tn.data - 1 end_gap
        , row_number() over (order by tc.data asc) rn
from data tc
left join data tn on tn.rx = tc.rx -1
where ( tn.data <> tc.data + 1 and tc.rx > 1  )
order by tc.rx desc


/*
start_gap   end_gap     rn
----------- ----------- --------------------
5              6              1
10            12            2
14            14            3
*/

Open in new window


For the Date Ranges, the approach is like this: we need to find the start and end ranges of the "merged" date ranges. For the "End Date" of those "merged ranges", we need to find the records where there is no other record which starts before the range's end, and ends afterwords. This is done in the "with d1".
For the "Start Ranges", we need to find the records where there is no other record which starts before the range's start, and ends afterwords. This is done in the "with d2".

Both d1 and d2 are row numbered (rn) in date order, and will then be joined together in the final select. Because the first range will have row number 1 for both start and end date, we will need to join the end date of range 1 with the start date of range 2 to give the gap between those 2 ranges.

; with d1 as (
select m.end_date
     , row_number() over (order by m.end_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.end_date
                   and o.id <> m.id 
              )
  group by m.end_date
)
, d2 as (
select m.start_date
     , row_number() over (order by m.start_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.start_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )
  group by m.start_date
)
select d1.end_date gap_start, d2.start_date gap_end, d1.rn
  from d1
  join d2
    on d1.rn = d2.rn - 1 

/*
gap_start               gap_end                 rn
----------------------- ----------------------- --------------------
2010-10-21 00:00:00.000 2010-10-22 00:00:00.000 1
2010-10-24 00:00:00.000 2010-10-26 00:00:00.000 2
2010-11-01 00:00:00.000 2010-11-04 00:00:00.000 3
*/

Open in new window



3. Merge Ranges

To find the ranges is basically the "opposite" of finding the gaps. For number ranges, the approach is hence the same as in finding the gaps.
;with data as ( 
  select t.*
       , row_number() over ( order by data asc ) rn
       , row_number() over ( order by data desc ) rx
  from @t t
)
, data2 as (
select tc.data, tc.rn, tc.rx, tp.data prev_data 
   , row_number() over (order by tc.rn) r2   
from data tc
  left join data tp
    on tp.rn = tc.rn - 1
 where ( tp.rn is null or tp.data <> tc.data - 1 )
)
select tc.data range_start
  , isnull(tn.prev_data, (select max(data) from data )) range_end
  , tc.r2 rn
  from data2 tc
  left join data2 tn
    on tn.r2 = tc.r2 + 1
order by tc.rn

/*
range_start range_end   rn
----------- ----------- --------------------
3            4            1
7            9            2
13          13          3
15          15          4
*/

Open in new window


To merge the date ranges, it's a bit trickier, but again, basically using the "gaps" technique, because the merged ranges are between the gaps, plus the first and last range identified by the min(start_date) and max(end_date).
; with d1 as (
select m.end_date
     , row_number() over (order by m.end_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.end_date
                   and o.id <> m.id 
              )
  group by m.end_date
)
, d2 as (
select m.start_date
     , row_number() over (order by m.start_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.start_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )
  group by m.start_date
)
, d3 as (
select d1.end_date gap_start, d2.start_date gap_end, d1.rn
  from d1
  join d2
    on d1.rn = d2.rn - 1 
 union all
select null, min(start_date), 0 from @d
 union all
select max(end_date), null, (select max(rn) from d2) from @d
)
select dc.rn + 1 rn, dc.gap_end range_start, dn.gap_start range_end
  from d3 dc
  join d3 dn
    on dn.rn = dc.rn + 1
order by dn.rn

/*
rn                   range_start             range_end
-------------------- ----------------------- -----------------------
1                    2010-10-21 00:00:00.000 2010-10-21 00:00:00.000
2                    2010-10-22 00:00:00.000 2010-10-24 00:00:00.000
3                    2010-10-26 00:00:00.000 2010-11-01 00:00:00.000
4                    2010-11-04 00:00:00.000 2010-11-07 00:00:00.000
*/

Open in new window



4. Overlapping Ranges

This is to find any records which overlap with some date range of any other record. This technique has been used in the above steps already, but often needed apart, which is why post that part anyhow.

The Algorithm is simple: comparing the current record with another record, if the other date range's end is after the current record's end AND the other's start before the current's end data, the 2 records are overlapping.

You might review this logic by drawing the different situations, and find that the 2 conditions as shown are sufficient to handle all the cases. It's important to note the condition to check that the ID values are NOT the same for the 2 records. This is needed here (and obviously in the above codes also), otherwise all records would logically overlap with at least themselves.
select * 
  from @d m
 where exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )

/*
start_date              end_date                id
----------------------- ----------------------- -----------
2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
*/

Open in new window


5. Performance

I did not test the code for performance explicitly. What is important is that your fields (data, start_date, end_date, id) are indexed, otherwise the ROW_NUMBER() function and hence the whole code will take ages to complete such queries.

It is to note that any alternative will need to order the records, so the "ordering" (which ROW_NUMBER does) will have to happen anyhow, somewhere. Either by the database engine, or by some application/scripting code.

Because we use Table Variables, the code has to be run as a complete set. That code is attached and ready to run in SQL Server.
 Angeliii-Gaps-and-Ranges-Code.tx.sql
8
Comment
3 Comments
 
LVL 32

Expert Comment

by:Daniel Wilson
Wow, you've really made an extremely frustrating problem appear extremely simple!  I expect to be back to this several times!
0
 
LVL 66

Expert Comment

by:Jim Horn
Excellent article.  Voted Yes.
0
 
LVL 143

Author Comment

by:Guy Hengel [angelIII / a3]
I have used the above in this solution, where we have several partitions for the date ranges, and on top having actually only single date values:
http://www.experts-exchange.com/Database/MS-SQL-Server/SQL_Server_2008/Q_28408382.html#a39988437
0

Featured Post

Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

Join & Write a Comment

This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
SQL Database Recovery Software repairs the MDF & NDF Files, corrupted due to hardware related issues or software related errors. Provides preview of recovered database objects and allows saving in either MSSQL, CSV, HTML or XLS format. Ensures recov…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month