We help IT Professionals succeed at work.

how do I write a SQL to find gaps in date ranges?

4,420 Views
Last Modified: 2017-07-31
Hi All,

I would like to get the gaps between date range and needed it to separate the gaps by MM ID.

could you please someone help me to resolve below in Netezza

INPUT:

ID | MM Number | From | To
12345|4444444 |2015-01-03 |2015-02-02
12346|4444444 |2015-02-03 |2015-03-02

<< missing one record ( ex same MM Number no record for period from 2015-03-03 to 2015-04-02

12347|4444444 |2015-04-03 |2015-05-01
12348|4444444 |2015-05-02 |2015-06-01
12349|4444444 |2015-06-02 |2015-07-01
12310|4444444 |2015-07-02 |2015-07-31

<< missing one record (ex: same MM Number no record for period from 2015-08-01 to 2015-08-31

12310|4444444|2015-09-01|2015-09-30



Expected Out put:
MM No | Missing Start Date | Missing To Date
4444444 |2015-03-03 |2015-04-02
4444444 |2015-08-01 |2015-08-31

I've a table with below data

create table icr_tmp
(
ID VARCHAR2(15),
NMI VARCHAR2(50),
INVOICE_START_DATE DATE,
INVOICE_END_DATE DATE);

insert into icr_tmp
values('12345','4444444','03/01/2015','02/02/2015');

insert into icr_tmp
values('12346','4444444','03/02/2015','02/03/2015');

insert into icr_tmp
values('12347','4444444','03/04/2015','01/05/2015');

insert into icr_tmp
values('12348','4444444','02/05/2015','01/06/2015');

insert into icr_tmp
values('12349','4444444','02/06/2015','01/07/2015');

insert into icr_tmp
values('12310','4444444','02/07/2015','31/07/2015');

insert into icr_tmp
values('12310','4444444','01/09/2015','30/09/2015');

Below SQL working in oracle perfectly but not in Netezza.

select * from (
  select nmi,
    max(invoice_end_date) over(partition by nmi order by invoice_start_date) + 1 start_gap,
    lead(invoice_start_date) over(partition by nmi order by invoice_start_date) - 1 end_gap
  from icr_tmp3
)
where start_gap <= end_gap;

NMI      START_GAP   END_GAP
4444444  2015-03-03  2015-04-02
4444444  2015-08-01  2015-08-31
Comment
Watch Question

Guy Hengel [angelIII / a3]Billing Engineer
CERTIFIED EXPERT
Most Valuable Expert 2014
Top Expert 2009

Commented:
please check out my article which I wrote to identify gaps in ranges:
https://www.experts-exchange.com/articles/3952/ranges-gaps-overlaps-for-number-and-date-ranges.html
Mike EghtebasDatabase and Application Developer

Commented:
Hi Nandika,

I have prepared CTE_GenDate which produces a dataset with one column named [DayYear] lists days from 2015-1-1 to 2015-12-31. So, now using  something like:

Select * From CTE_GenDate ... EXCEPT ... Select * From icr_tmp

I have requested some experts to take a look at it and tie this CTE with your data to produce the result you need. Meanwhile I will continoue working on it as well.

;WITH CTE_GenDate
AS (
	SELECT DATEFROMPARTS(2015,1,1) DayYear
	UNION ALL
	SELECT DATEADD(day,1,DayYear) 
	FROM CTE_GenDate
	WHERE DayYear < DATEFROMPARTS(2015,12,31) 
	)
SELECT *
FROM CTE_GenDate
OPTION (MAXRECURSION 366)

Open in new window

Author

Commented:
Hi Mike,

Any luck with solution for above issue.

Regards,
Nandika
Mike EghtebasDatabase and Application Developer

Commented:
Hi Nandika,
Below, I have:
A. Make a table varible to store your data as shown in part B below:
declare @Table1 table
    (id int, StartDate datetime, EndDate datetime)

INSERT INTO @table1
    (id, StartDate, EndDate)
VALUES
    (1, '2017-01-01 00:00:00', '2017-01-04 00:00:00'),
    (2, '2017-01-19 00:00:00', '2017-01-22 00:00:00'),
    (3, '2017-02-01 00:00:00', '2017-02-04 00:00:00')

select * from @table1;

Open in new window

Note: Via an update query, it might be necessary to change the data format to look in YYYY-MM-DD style. This will be simple task for later when it becomes necessary.

B. Sample data similar to your data:
id   StartDate                  EndDate
1	2017-01-01 00:00:00.000	2017-01-04 00:00:00.000
2	2017-01-19 00:00:00.000	2017-01-22 00:00:00.000
3	2017-02-01 00:00:00.000	2017-02-04 00:00:00.000

Open in new window


C. Below, the above data ranges are inserted to a single column named theDate, see part D below.
;with cteMinMax as (
    select
          min(ca.allDates) as minDate
        , datediff(day, min(ca.allDates),max(ca.allDates)) as numDays     
    from @table1
    cross apply (
        values
           (StartDate),
           (EndDate)
        ) ca (allDates)
),
cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS num
    FROM cteDigits [1s]
    CROSS JOIN cteDigits [10s]
    CROSS JOIN cteDigits [100s]
    CROSS JOIN cteDigits [1000s]
)

select distinct
 --     r.minDate
 --   , n.num
     convert(varchar(10),dateadd(day, n.num, minDate),110) theDate
--    , t.id  source_id
--    , t.StartDate
--    , t.EndDate
from cteMinMax r
inner join cteTally n on n.num <= r.numDays
inner join @table1 t on dateadd(day, n.num, minDate) between t.StartDate and t.EndDate
order by theDate

Open in new window


D. So, this dataset (the range of dates used) can be used with our cte result above using either right join or except operator to produce the result you want.
theDate
01-01-2017
01-02-2017
01-03-2017
01-04-2017
01-19-2017
01-20-2017
01-21-2017
01-22-2017
02-01-2017
02-02-2017
02-03-2017
02-04-2017

Open in new window


I need to do a little bit more work on this if someone else doesn't pickup where I am now.

You can test it by adding part A and C together and run it.
Mike EghtebasDatabase and Application Developer

Commented:
Ok, I have it now:
SELECT DaysFullRange 
FROM #FullRange      --table containing date range from min. start data through max. end date in a single column. 
EXCEPT
SELECT UsedDays     
FROM #RangeUsed  --table containing dates used in a single column.
 

Open in new window

I am using two temp tables (#FullRange  and #RangeUsed). I have tested and it works.

Vitor Montalv√£o helped me with #FullRange and PortletPaul helped me with #RangeUsed.

Code:
use ee

create table #Data1 -- table containing your data (start - end range)
    (id int, StartDate datetime, EndDate datetime)

create table #RangeUsed  -- date ranges used in #Data1 in a single column (named UsedDays)
    (UsedDays datetime)

create table #FullRange -- table containing days in full min/max range from #Data1.
    (DaysFullRange datetime)

INSERT INTO #Data1  -- populate table with your data data (start - end range)
    (id, StartDate, EndDate)
VALUES
    (1, '2017-01-01 00:00:00', '2017-01-04 00:00:00'),
    (2, '2017-01-19 00:00:00', '2017-01-22 00:00:00'),
    (3, '2017-02-01 00:00:00', '2017-02-04 00:00:00')

--drop table #Data1;  select * from #Data1
--drop table #RangeUsed;  select * from #RangeUsed
--drop table #FullRange;  select * from #FullRange

----------------
;with cteMinMax as (
    select
          min(ca.allDates) as minDate
        , datediff(day, min(ca.allDates),max(ca.allDates)) as numDays     
    from #Data1
    cross apply (
        values
           (StartDate),
           (EndDate)
        ) ca (allDates)
),
cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS num
    FROM cteDigits [1s]
    CROSS JOIN cteDigits [10s]
    CROSS JOIN cteDigits [100s]
    CROSS JOIN cteDigits [1000s]
)
Insert Into #RangeUsed(UsedDays)
select distinct
 --     r.minDate
 --   , n.num
     convert(varchar(10),dateadd(day, n.num, minDate),110) theDate
--    , t.id  source_id
--    , t.StartDate
--    , t.EndDate
from cteMinMax r
inner join cteTally n on n.num <= r.numDays
inner join #Data1 t on dateadd(day, n.num, minDate) between t.StartDate and t.EndDate
order by theDate
--------------------------------------------


;WITH CTE_GenDate
AS (
	SELECT DATEFROMPARTS(2017,1,1) DayYear
	UNION ALL
	SELECT DATEADD(day,1,DayYear) 
	FROM CTE_GenDate
	WHERE DayYear < DATEFROMPARTS(2017,12,31) 
	)
Insert Into #FullRange   
SELECT *
FROM CTE_GenDate
Where DayYear >='01-01-2016' and DayYear <= '04-02-2017' -- table containing days in full min/max range from #Data1.
OPTION (MAXRECURSION 366)

Open in new window


The solutions from Guy Hengel are impressive. I worked on this solution to basically challenge myself and along the way, I posted questions and used the other experts time to solve this.

Thanks,

Mike

Commented:
Hello.

I'm wondering if Netezza will allow you to use some relatively recent T-SQL constructs. For example, the code below runs fine on SQL Server 2012...
if object_id('dbo.icr_tmp') is not null drop table dbo.icr_tmp

-- 
create table dbo.icr_tmp
	(
		ID varchar(15),
		NMI varchar(50),
		INVOICE_START_DATE date,
		INVOICE_END_DATE date
	); 

-- 
insert into dbo.icr_tmp (ID, NMI, INVOICE_START_DATE, INVOICE_END_DATE)
values 
('12345', '4444444', '01/03/2015', '02/02/2015'),
('12346', '4444444', '02/03/2015', '03/02/2015'),
('12347', '4444444', '04/03/2015', '05/01/2015'),
('12348', '4444444', '05/02/2015', '06/01/2015'),
('12349', '4444444', '06/02/2015', '07/01/2015'),
('12310', '4444444', '07/02/2015', '07/31/2015'),
('12310', '4444444', '09/01/2015', '09/30/2015')

-- 
;with cte_Gaps as (
	select i.NMI, dateadd(day, 1, max(i.INVOICE_END_DATE) over (partition by i.NMI order by i.INVOICE_START_DATE)) start_gap,
		dateadd(day, -1, lead(i.INVOICE_START_DATE) over (partition by i.NMI order by i.INVOICE_START_DATE)) end_gap
		from dbo.icr_tmp i
	)

select *
	from cte_Gaps g
	where g.start_gap <= g.end_gap;

Open in new window

...and, if I understand your question correctly, I think the desired output is what you want...

NMI         start_gap        end_gap
4444444	2015-03-03	2015-04-02
4444444	2015-08-01	2015-08-31

Open in new window

Mike EghtebasDatabase and Application Developer

Commented:
Hi Nandika, This is the solution without temp tables all in one shot: (use lines 11 through 58 only. ignore the rest). But, if you are not logged to your own database, you will use the code in its entirety.
create table #Data1 -- table containing your data (start - end range)
    (id int, StartDate datetime, EndDate datetime)

INSERT INTO #Data1  -- populate table with your data data (start - end range)
    (id, StartDate, EndDate)
VALUES
    (1, '2017-01-01 00:00:00', '2017-01-04 00:00:00'),
    (2, '2017-01-19 00:00:00', '2017-01-22 00:00:00'),
    (3, '2017-02-01 00:00:00', '2017-02-04 00:00:00')

;with cteMinMax as (
    select
          min(ca.allDates) as minDate
        , datediff(day, min(ca.allDates),max(ca.allDates)) as numDays     
    from #Data1
    cross apply (
        values
           (StartDate),
           (EndDate)
        ) ca (allDates)
),
cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS num
    FROM cteDigits [1s]
    CROSS JOIN cteDigits [10s]
    CROSS JOIN cteDigits [100s]
    CROSS JOIN cteDigits [1000s]
),
cteRangeUsed AS
(
	select distinct
		 convert(varchar(10),dateadd(day, n.num, minDate),110) UsedDays
	from cteMinMax r
		inner join cteTally n on n.num <= r.numDays
		inner join #Data1 t on dateadd(day, n.num, minDate) between t.StartDate and t.EndDate
),
CTE_GenDate AS (
	SELECT DATEFROMPARTS(2017,1,1) DayYear
	UNION ALL
	SELECT DATEADD(day,1,DayYear) 
	FROM CTE_GenDate
	WHERE DayYear < DATEFROMPARTS(2017,12,31) 
),
cteFullRange AS (
	SELECT DayYear DaysFullRange
	FROM CTE_GenDate
	Where DayYear >='01-01-2016' and DayYear <= '04-02-2017' -- table containing days in full min/max range from #Data1.
)
SELECT DaysFullRange 
FROM cteFullRange  
EXCEPT
SELECT UsedDays     
FROM cteRangeUsed  --table containing dates used in a single column.
OPTION (MAXRECURSION 366)
 
drop table #Data1

Open in new window


This produces 80 rows (80 days from min start date through max end date in our original data shown below:
create table #Data1 -- table containing your data (start - end range)
    (id int, StartDate datetime, EndDate datetime)

INSERT INTO #Data1  -- populate table with your data data (start - end range)
    (id, StartDate, EndDate)
VALUES
    (1, '2017-01-01 00:00:00', '2017-01-04 00:00:00'),
    (2, '2017-01-19 00:00:00', '2017-01-22 00:00:00'),
    (3, '2017-02-01 00:00:00', '2017-02-04 00:00:00')

Select * From #Data1;
id		StartDate		EndDate
1	2017-01-01 00:00:00.000	2017-01-04 00:00:00.000
2	2017-01-19 00:00:00.000	2017-01-22 00:00:00.000
3	2017-02-01 00:00:00.000	2017-02-04 00:00:00.000

Open in new window



The results you want (80 days not used from min start date through max end date in our original table):
2017-01-05
2017-01-06
2017-01-07
2017-01-08
2017-01-09
2017-01-10
2017-01-11
2017-01-12
2017-01-13
2017-01-14
2017-01-15
2017-01-16
2017-01-17
2017-01-18
2017-01-23
2017-01-24
2017-01-25
2017-01-26
2017-01-27
2017-01-28
2017-01-29
2017-01-30
2017-01-31
2017-02-05
2017-02-06
2017-02-07
2017-02-08
2017-02-09
2017-02-10
2017-02-11
2017-02-12
2017-02-13
2017-02-14
2017-02-15
2017-02-16
2017-02-17
2017-02-18
2017-02-19
2017-02-20
2017-02-21
2017-02-22
2017-02-23
2017-02-24
2017-02-25
2017-02-26
2017-02-27
2017-02-28
2017-03-01
2017-03-02
2017-03-03
2017-03-04
2017-03-05
2017-03-06
2017-03-07
2017-03-08
2017-03-09
2017-03-10
2017-03-11
2017-03-12
2017-03-13
2017-03-14
2017-03-15
2017-03-16
2017-03-17
2017-03-18
2017-03-19
2017-03-20
2017-03-21
2017-03-22
2017-03-23
2017-03-24
2017-03-25
2017-03-26
2017-03-27
2017-03-28
2017-03-29
2017-03-30
2017-03-31
2017-04-01
2017-04-02

Open in new window


If you want this to be displayed in two columns as start and end date, please let me know to do that.
 
Good luck with your project.

Mike

Note: Because you already have your version of data table, replace #Data1 at line 15 of the code with your table name and make sure you have columns:
 (id int, StartDate datetime, EndDate datetime) in it with dates entered in it are like:  2017-01-01 00:00:00. If not, we can use an update query to match as needed.

Commented:
One more thought on this question. I'm still assuming that the output you are looking for looks like this:

NMI      START_GAP     END_GAP
4444444  2015-03-03  2015-04-02
4444444  2015-08-01  2015-08-31

Open in new window


If that is correct, then here's some SQL to try that should be compatible with older versions of T-SQL. Let us know if it works in Netezza.

if object_id('dbo.icr_tmp') is not null drop table dbo.icr_tmp

-- 
create table dbo.icr_tmp (
		ID varchar(15),
		NMI varchar(50),
		INVOICE_START_DATE date,
		INVOICE_END_DATE date
	); 

-- 
insert into icr_tmp values('12345','4444444','2015-01-03', '2015-02-02'); 
insert into icr_tmp values('12346','4444444','2015-02-03', '2015-03-02'); 
insert into icr_tmp values('12347','4444444','2015-04-03', '2015-05-01'); 
insert into icr_tmp values('12348','4444444','2015-05-02', '2015-06-01'); 
insert into icr_tmp values('12349','4444444','2015-06-02', '2015-07-01'); 
insert into icr_tmp values('12310','4444444','2015-07-02', '2015-07-31'); 
insert into icr_tmp values('12310','4444444','2015-09-01', '2015-09-30'); 

-- 
;with cte_Gaps2 as (
	select i.NMI, dateadd(day, 1, (select min(i2.INVOICE_END_DATE) from dbo.icr_tmp as i2 
			where i2.NMI = i.NMI
				and i2.INVOICE_START_DATE >= i.INVOICE_START_DATE
			group by i2.NMI
			)) as StartGap,
		dateadd(day, -1, (select min(i2.INVOICE_START_DATE) from dbo.icr_tmp as i2 
			where i2.NMI = i.NMI
				and i2.INVOICE_START_DATE > i.INVOICE_START_DATE
			group by i2.NMI
			)) as EndGap
		from dbo.icr_tmp i
	)

-- 
select *
	from cte_Gaps2 g
	where g.StartGap <= g.EndGap;

Open in new window

Commented:
I just updated the post above to use ISO standard format rather than US format.

Commented:
Any update on this one? Have you had a chance to try the solutions suggested?

Author

Commented:
Didn't get a chance yet to try the solutions yet, will update soon with the outcome.

Much appreciated with all for the support

Commented:
Thanks for the update. Can you clarify what you're looking for as far as a result set? Is this the output you want?

NMI      START_GAP     END_GAP
4444444  2015-03-03  2015-04-02
4444444  2015-08-01  2015-08-31

Open in new window

Author

Commented:
Yes, that's the result set I'm looking for

Commented:
Hello. I'm curious to know if you have had the chance to look at the suggested code, yet?

Commented:
Hello. Any opportunity to test the solutions above?
Get access with a 7-day free trial.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.