Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

ranges, gaps & overlaps (for number and date ranges)

Guy Hengel [angelIII / a3]Billing Engineer
CERTIFIED EXPERT
Published:

0. Introduction

I have seen many questions on EE and elsewhere, asking about how to find either
gaps in lists of numbers (id field, usually)
ranges of values or dates
overlapping date ranges
combined date ranges

I thought it would be a good idea to share how I approached this in a pure SQL manner, though a word of warning: it might not be the most efficient code in your scenario.

The code works in sql server 2005+, but should also work in Oracle, replacing the table variable with a real table.

1. Data Samples

I will create 2 tables, one with numerical data (@t), and one with date ranges (@d).
For @t, we assume that the field has distinct data, but it should not change the outcome.
For @d, an additional column has been added to ensure that we can identify the rows.


-- First the number based series
                      
                      declare @t table ( data int )
                      insert into @t (data) values ( 3 )
                      insert into @t (data) values ( 4 )
                      insert into @t (data) values ( 7 )
                      insert into @t (data) values ( 8 )
                      insert into @t (data) values ( 9 )
                      insert into @t (data) values ( 13 )
                      insert into @t (data) values ( 15 )
                      
                      select *  from @t order by data
                      
                      /* gives the result:
                      data
                      -----------
                      3
                      4
                      7
                      8
                      9
                      13
                      15
                      */
                      
                      -- Now the date based series
                      
                      declare @d table ( start_date datetime , end_date datetime , id int identity)
                      declare @x datetime
                      set @x = convert(datetime, convert(varchar(10), getdate(), 120), 120)
                      insert into @d (start_date, end_date ) select @x, @x 
                      
                      insert into @d (start_date, end_date ) select @x + 1, @x + 3
                      
                      insert into @d (start_date, end_date ) select @x + 5, @x + 8
                      insert into @d (start_date, end_date ) select @x + 7, @x + 10
                      insert into @d (start_date, end_date ) select @x + 10, @x + 11
                      
                      insert into @d (start_date, end_date ) select @x + 15, @x + 17
                      insert into @d (start_date, end_date ) select @x + 14, @x + 16
                      
                      select * 
                        from @d 
                       order by start_date, end_date
                      
                      /* gives the result :
                      start_date              end_date                id
                      ----------------------- ----------------------- -----------
                      2010-10-21 00:00:00.000 2010-10-21 00:00:00.000 1
                      2010-10-22 00:00:00.000 2010-10-24 00:00:00.000 2
                      2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
                      2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
                      2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
                      2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
                      2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
                      */

Open in new window


2. Find gaps

The approach to find the gaps (as ranges), is to find the value, where the next value is not present. So, we number the rows by order of the data value (ROW_NUMBER), and join to the previous row number. IF the data of the previous row number is indeed the current's data - 1, it's not a gap. If it is different, then we know where the gap starts (previous data + 1), and where it ends (current data -1).

;with data as ( 
                        select t.*
                             , row_number() over ( order by data desc ) rx
                        from @t t
                      )
                      select tc.data + 1 start_gap
                              , tn.data - 1 end_gap
                              , row_number() over (order by tc.data asc) rn
                      from data tc
                      left join data tn on tn.rx = tc.rx -1
                      where ( tn.data <> tc.data + 1 and tc.rx > 1  )
                      order by tc.rx desc
                      
                      
                      /*
                      start_gap   end_gap     rn
                      ----------- ----------- --------------------
                      5              6              1
                      10            12            2
                      14            14            3
                      */

Open in new window


For the Date Ranges, the approach is like this: we need to find the start and end ranges of the "merged" date ranges. For the "End Date" of those "merged ranges", we need to find the records where there is no other record which starts before the range's end, and ends afterwords. This is done in the "with d1".
For the "Start Ranges", we need to find the records where there is no other record which starts before the range's start, and ends afterwords. This is done in the "with d2".

Both d1 and d2 are row numbered (rn) in date order, and will then be joined together in the final select. Because the first range will have row number 1 for both start and end date, we will need to join the end date of range 1 with the start date of range 2 to give the gap between those 2 ranges.

; with d1 as (
                      select m.end_date
                           , row_number() over (order by m.end_date ) rn
                        from @d m
                       where not exists ( select null 
                                        from @d o 
                                       where o.start_date <= m.end_date
                                         and o.end_date >= m.end_date
                                         and o.id <> m.id 
                                    )
                        group by m.end_date
                      )
                      , d2 as (
                      select m.start_date
                           , row_number() over (order by m.start_date ) rn
                        from @d m
                       where not exists ( select null 
                                        from @d o 
                                       where o.start_date <= m.start_date
                                         and o.end_date >= m.start_date
                                         and o.id <> m.id 
                                    )
                        group by m.start_date
                      )
                      select d1.end_date gap_start, d2.start_date gap_end, d1.rn
                        from d1
                        join d2
                          on d1.rn = d2.rn - 1 
                      
                      /*
                      gap_start               gap_end                 rn
                      ----------------------- ----------------------- --------------------
                      2010-10-21 00:00:00.000 2010-10-22 00:00:00.000 1
                      2010-10-24 00:00:00.000 2010-10-26 00:00:00.000 2
                      2010-11-01 00:00:00.000 2010-11-04 00:00:00.000 3
                      */

Open in new window



3. Merge Ranges

To find the ranges is basically the "opposite" of finding the gaps. For number ranges, the approach is hence the same as in finding the gaps.
;with data as ( 
                        select t.*
                             , row_number() over ( order by data asc ) rn
                             , row_number() over ( order by data desc ) rx
                        from @t t
                      )
                      , data2 as (
                      select tc.data, tc.rn, tc.rx, tp.data prev_data 
                         , row_number() over (order by tc.rn) r2   
                      from data tc
                        left join data tp
                          on tp.rn = tc.rn - 1
                       where ( tp.rn is null or tp.data <> tc.data - 1 )
                      )
                      select tc.data range_start
                        , isnull(tn.prev_data, (select max(data) from data )) range_end
                        , tc.r2 rn
                        from data2 tc
                        left join data2 tn
                          on tn.r2 = tc.r2 + 1
                      order by tc.rn
                      
                      /*
                      range_start range_end   rn
                      ----------- ----------- --------------------
                      3            4            1
                      7            9            2
                      13          13          3
                      15          15          4
                      */

Open in new window


To merge the date ranges, it's a bit trickier, but again, basically using the "gaps" technique, because the merged ranges are between the gaps, plus the first and last range identified by the min(start_date) and max(end_date).
; with d1 as (
                      select m.end_date
                           , row_number() over (order by m.end_date ) rn
                        from @d m
                       where not exists ( select null 
                                        from @d o 
                                       where o.start_date <= m.end_date
                                         and o.end_date >= m.end_date
                                         and o.id <> m.id 
                                    )
                        group by m.end_date
                      )
                      , d2 as (
                      select m.start_date
                           , row_number() over (order by m.start_date ) rn
                        from @d m
                       where not exists ( select null 
                                        from @d o 
                                       where o.start_date <= m.start_date
                                         and o.end_date >= m.start_date
                                         and o.id <> m.id 
                                    )
                        group by m.start_date
                      )
                      , d3 as (
                      select d1.end_date gap_start, d2.start_date gap_end, d1.rn
                        from d1
                        join d2
                          on d1.rn = d2.rn - 1 
                       union all
                      select null, min(start_date), 0 from @d
                       union all
                      select max(end_date), null, (select max(rn) from d2) from @d
                      )
                      select dc.rn + 1 rn, dc.gap_end range_start, dn.gap_start range_end
                        from d3 dc
                        join d3 dn
                          on dn.rn = dc.rn + 1
                      order by dn.rn
                      
                      /*
                      rn                   range_start             range_end
                      -------------------- ----------------------- -----------------------
                      1                    2010-10-21 00:00:00.000 2010-10-21 00:00:00.000
                      2                    2010-10-22 00:00:00.000 2010-10-24 00:00:00.000
                      3                    2010-10-26 00:00:00.000 2010-11-01 00:00:00.000
                      4                    2010-11-04 00:00:00.000 2010-11-07 00:00:00.000
                      */

Open in new window



4. Overlapping Ranges

This is to find any records which overlap with some date range of any other record. This technique has been used in the above steps already, but often needed apart, which is why post that part anyhow.

The Algorithm is simple: comparing the current record with another record, if the other date range's end is after the current record's end AND the other's start before the current's end data, the 2 records are overlapping.

You might review this logic by drawing the different situations, and find that the 2 conditions as shown are sufficient to handle all the cases. It's important to note the condition to check that the ID values are NOT the same for the 2 records. This is needed here (and obviously in the above codes also), otherwise all records would logically overlap with at least themselves.
select * 
                        from @d m
                       where exists ( select null 
                                        from @d o 
                                       where o.start_date <= m.end_date
                                         and o.end_date >= m.start_date
                                         and o.id <> m.id 
                                    )
                      
                      /*
                      start_date              end_date                id
                      ----------------------- ----------------------- -----------
                      2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
                      2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
                      2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
                      2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
                      2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
                      */

Open in new window


5. Performance

I did not test the code for performance explicitly. What is important is that your fields (data, start_date, end_date, id) are indexed, otherwise the ROW_NUMBER() function and hence the whole code will take ages to complete such queries.

It is to note that any alternative will need to order the records, so the "ordering" (which ROW_NUMBER does) will have to happen anyhow, somewhere. Either by the database engine, or by some application/scripting code.

Because we use Table Variables, the code has to be run as a complete set. That code is attached and ready to run in SQL Server.
  Angeliii-Gaps-and-Ranges-Code.tx.sql
8
18,925 Views
Guy Hengel [angelIII / a3]Billing Engineer
CERTIFIED EXPERT

Comments (3)

Wow, you've really made an extremely frustrating problem appear extremely simple!  I expect to be back to this several times!
Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Most Valuable Expert 2013
Author of the Year 2015

Commented:
Excellent article.  Voted Yes.
Guy Hengel [angelIII / a3]Billing Engineer
CERTIFIED EXPERT
Most Valuable Expert 2014
Top Expert 2009

Author

Commented:
I have used the above in this solution, where we have several partitions for the date ranges, and on top having actually only single date values:
https://www.experts-exchange.com/questions/28408382/Merging-daily-records-to-range-of-dates-in-SQL.html?anchorAnswerId=39988437#a39988437

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.