ranges, gaps & overlaps (for number and date ranges)

AID: 3952
  • Status: Published

13330 points

  • ByangelIII
  • TypeFAQs
  • Posted on2010-10-21 at 09:33:21
Awards
  • Community Pick
  • Experts Exchange Approved
  • Editor's Choice
0

Introduction


I have seen many questions on EE and elsewhere, asking about how to find either
  • gaps in lists of numbers (id field, usually)

  • ranges of values or dates

  • overlapping date ranges

  • combined date ranges


I thought it would be a good idea to share how I approached this in a pure SQL manner, though a word of warning: it might not be the most efficient code in your scenario.

The code works in sql server 2005+, but should also work in Oracle, replacing the table variable with a real table.

1

Data Samples


I will create 2 tables, one with numerical data (@t), and one with date ranges (@d).
  • For @t, we assume that the field has distinct data, but it should not change the outcome.

  • For @d, an additional column has been added to ensure that we can identify the rows.


-- First the number based series

declare @t table ( data int )
insert into @t (data) values ( 3 )
insert into @t (data) values ( 4 )
insert into @t (data) values ( 7 )
insert into @t (data) values ( 8 )
insert into @t (data) values ( 9 )
insert into @t (data) values ( 13 )
insert into @t (data) values ( 15 )

select *  from @t order by data

/* gives the result:
data
-----------
3
4
7
8
9
13
15
*/

-- Now the date based series

declare @d table ( start_date datetime , end_date datetime , id int identity)
declare @x datetime
set @x = convert(datetime, convert(varchar(10), getdate(), 120), 120)
insert into @d (start_date, end_date ) select @x, @x 

insert into @d (start_date, end_date ) select @x + 1, @x + 3

insert into @d (start_date, end_date ) select @x + 5, @x + 8
insert into @d (start_date, end_date ) select @x + 7, @x + 10
insert into @d (start_date, end_date ) select @x + 10, @x + 11

insert into @d (start_date, end_date ) select @x + 15, @x + 17
insert into @d (start_date, end_date ) select @x + 14, @x + 16

select * 
  from @d 
 order by start_date, end_date

/* gives the result :
start_date              end_date                id
----------------------- ----------------------- -----------
2010-10-21 00:00:00.000 2010-10-21 00:00:00.000 1
2010-10-22 00:00:00.000 2010-10-24 00:00:00.000 2
2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
*/
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:

Select allOpen in new window



2

Find gaps


The approach to find the gaps (as ranges), is to find the value, where the next value is not present. So, we number the rows by order of the data value (ROW_NUMBER), and join to the previous row number. IF the data of the previous row number is indeed the current's data - 1, it's not a gap. If it is different, then we know where the gap starts (previous data + 1), and where it ends (current data -1).

;with data as ( 
  select t.*
       , row_number() over ( order by data desc ) rx
  from @t t
)
select tc.data + 1 start_gap
        , tn.data - 1 end_gap
        , row_number() over (order by tc.data asc) rn
from data tc
left join data tn on tn.rx = tc.rx -1
where ( tn.data <> tc.data + 1 and tc.rx > 1  )
order by tc.rx desc


/*
start_gap   end_gap     rn
----------- ----------- --------------------
5              6              1
10            12            2
14            14            3
*/
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:

Select allOpen in new window



For the Date Ranges, the approach is like this: we need to find the start and end ranges of the "merged" date ranges. For the "End Date" of those "merged ranges", we need to find the records where there is no other record which starts before the range's end, and ends afterwords. This is done in the "with d1".
For the "Start Ranges", we need to find the records where there is no other record which starts before the range's start, and ends afterwords. This is done in the "with d2".

Both d1 and d2 are row numbered (rn) in date order, and will then be joined together in the final select. Because the first range will have row number 1 for both start and end date, we will need to join the end date of range 1 with the start date of range 2 to give the gap between those 2 ranges.

; with d1 as (
select m.end_date
     , row_number() over (order by m.end_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.end_date
                   and o.id <> m.id 
              )
  group by m.end_date
)
, d2 as (
select m.start_date
     , row_number() over (order by m.start_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.start_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )
  group by m.start_date
)
select d1.end_date gap_start, d2.start_date gap_end, d1.rn
  from d1
  join d2
    on d1.rn = d2.rn - 1 

/*
gap_start               gap_end                 rn
----------------------- ----------------------- --------------------
2010-10-21 00:00:00.000 2010-10-22 00:00:00.000 1
2010-10-24 00:00:00.000 2010-10-26 00:00:00.000 2
2010-11-01 00:00:00.000 2010-11-04 00:00:00.000 3
*/
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:

Select allOpen in new window




3

Merge Ranges


To find the ranges is basically the "opposite" of finding the gaps. For number ranges, the approach is hence the same as in finding the gaps.
;with data as ( 
  select t.*
       , row_number() over ( order by data asc ) rn
       , row_number() over ( order by data desc ) rx
  from @t t
)
, data2 as (
select tc.data, tc.rn, tc.rx, tp.data prev_data 
   , row_number() over (order by tc.rn) r2   
from data tc
  left join data tp
    on tp.rn = tc.rn - 1
 where ( tp.rn is null or tp.data <> tc.data - 1 )
)
select tc.data range_start
  , isnull(tn.prev_data, (select max(data) from data )) range_end
  , tc.r2 rn
  from data2 tc
  left join data2 tn
    on tn.r2 = tc.r2 + 1
order by tc.rn

/*
range_start range_end   rn
----------- ----------- --------------------
3            4            1
7            9            2
13          13          3
15          15          4
*/
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:

Select allOpen in new window



To merge the date ranges, it's a bit trickier, but again, basically using the "gaps" technique, because the merged ranges are between the gaps, plus the first and last range identified by the min(start_date) and max(end_date).
; with d1 as (
select m.end_date
     , row_number() over (order by m.end_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.end_date
                   and o.id <> m.id 
              )
  group by m.end_date
)
, d2 as (
select m.start_date
     , row_number() over (order by m.start_date ) rn
  from @d m
 where not exists ( select null 
                  from @d o 
                 where o.start_date <= m.start_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )
  group by m.start_date
)
, d3 as (
select d1.end_date gap_start, d2.start_date gap_end, d1.rn
  from d1
  join d2
    on d1.rn = d2.rn - 1 
 union all
select null, min(start_date), 0 from @d
 union all
select max(end_date), null, (select max(rn) from d2) from @d
)
select dc.rn + 1 rn, dc.gap_end range_start, dn.gap_start range_end
  from d3 dc
  join d3 dn
    on dn.rn = dc.rn + 1
order by dn.rn

/*
rn                   range_start             range_end
-------------------- ----------------------- -----------------------
1                    2010-10-21 00:00:00.000 2010-10-21 00:00:00.000
2                    2010-10-22 00:00:00.000 2010-10-24 00:00:00.000
3                    2010-10-26 00:00:00.000 2010-11-01 00:00:00.000
4                    2010-11-04 00:00:00.000 2010-11-07 00:00:00.000
*/
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:

Select allOpen in new window




4

Overlapping Ranges


This is to find any records which overlap with some date range of any other record. This technique has been used in the above steps already, but often needed apart, which is why post that part anyhow.

The Algorithm is simple: comparing the current record with another record, if the other date range's end is after the current record's end AND the other's start before the current's end data, the 2 records are overlapping.

You might review this logic by drawing the different situations, and find that the 2 conditions as shown are sufficient to handle all the cases. It's important to note the condition to check that the ID values are NOT the same for the 2 records. This is needed here (and obviously in the above codes also), otherwise all records would logically overlap with at least themselves.
select * 
  from @d m
 where exists ( select null 
                  from @d o 
                 where o.start_date <= m.end_date
                   and o.end_date >= m.start_date
                   and o.id <> m.id 
              )

/*
start_date              end_date                id
----------------------- ----------------------- -----------
2010-10-26 00:00:00.000 2010-10-29 00:00:00.000 3
2010-10-28 00:00:00.000 2010-10-31 00:00:00.000 4
2010-10-31 00:00:00.000 2010-11-01 00:00:00.000 5
2010-11-05 00:00:00.000 2010-11-07 00:00:00.000 6
2010-11-04 00:00:00.000 2010-11-06 00:00:00.000 7
*/
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:

Select allOpen in new window



5

Performance


I did not test the code for performance explicitly. What is important is that your fields (data, start_date, end_date, id) are indexed, otherwise the ROW_NUMBER() function and hence the whole code will take ages to complete such queries.

It is to note that any alternative will need to order the records, so the "ordering" (which ROW_NUMBER does) will have to happen anyhow, somewhere. Either by the database engine, or by some application/scripting code.

Because we use Table Variables, the code has to be run as a complete set. That code is attached and ready to run in SQL Server.
 
    Asked On
    2010-10-21 at 09:33:21ID3952
    Tags

    range gap overlap date

    Topic

    Databases Miscellaneous

    Views
    3020

    Comments

    Expert Comment

    by: DanielWilson on 2011-10-07 at 06:25:00ID: 32152

    Wow, you've really made an extremely frustrating problem appear extremely simple!  I expect to be back to this several times!

    Add your Comment

    Please Sign up or Log in to comment on this article.

    Join Experts Exchange Today

    Gain Access to all our Tech Resources

    Get personalized answers

    Ask unlimited questions

    Access Proven Solutions

    Search 3.2 million solutions

    Read In-Depth How-To Guides

    1000+ articles, demos, & tips

    Watch Step by Step Tutorials

    Learn direct from top tech pros

    And Much More!

    Your complete tech resource

    See Plans and Pricing

    30-day free trial. Register in 60 seconds.

    Loading Advertisement...

    Top Misc Databases Experts

    1. sdstuber

      61,970

      Master

      0 points yesterday

      Profile
      Rank: Genius
    2. johanntagle

      58,032

      Master

      2,000 points yesterday

      Profile
      Rank: Sage
    3. slightwv

      54,382

      Master

      2,000 points yesterday

      Profile
      Rank: Genius
    4. theartfuldazzler

      45,100

      0 points yesterday

      Profile
      Rank: Guru
    5. lowaloysius

      37,600

      0 points yesterday

      Profile
      Rank: Guru
    6. Ray_Paseur

      37,450

      0 points yesterday

      Profile
      Rank: Savant
    7. angelIII

      32,258

      0 points yesterday

      Profile
      Rank: Elite
    8. mwvisa1

      24,010

      0 points yesterday

      Profile
      Rank: Genius
    9. dqmq

      19,400

      0 points yesterday

      Profile
      Rank: Genius
    10. HainKurt

      19,250

      0 points yesterday

      Profile
      Rank: Genius
    11. als315

      17,700

      0 points yesterday

      Profile
      Rank: Genius
    12. TempDBA

      16,880

      1,680 points yesterday

      Profile
      Rank: Sage
    13. dtodd

      16,148

      0 points yesterday

      Profile
      Rank: Genius
    14. ScottPletcher

      15,937

      0 points yesterday

      Profile
      Rank: Genius
    15. DaveBaldwin

      15,821

      0 points yesterday

      Profile
      Rank: Genius
    16. matthewspatrick

      15,600

      0 points yesterday

      Profile
      Rank: Savant
    17. wasimibm

      15,540

      0 points yesterday

      Profile
      Rank: Guru
    18. lcohan

      15,120

      0 points yesterday

      Profile
      Rank: Genius
    19. BillBach

      14,700

      0 points yesterday

      Profile
      Rank: Genius
    20. jogos

      14,575

      0 points yesterday

      Profile
      Rank: Sage
    21. acperkins

      14,411

      0 points yesterday

      Profile
      Rank: Genius
    22. NickUpson

      14,350

      0 points yesterday

      Profile
      Rank: Sage
    23. boag2000

      14,119

      0 points yesterday

      Profile
      Rank: Genius
    24. mlmcc

      14,050

      0 points yesterday

      Profile
      Rank: Savant
    25. LSMConsulting

      13,100

      0 points yesterday

      Profile
      Rank: Savant

    Hall Of Fame