Link to home
Start Free TrialLog in
Avatar of PHIL Sawyer
PHIL SawyerFlag for United Kingdom of Great Britain and Northern Ireland

asked on

FILL IN MISSING DATES

Hello
I have created a table of dates (daily) and then joined to my data where it returns employees and their start date in a given department and End Date of Employment. Note: An employee could move departments and then have a new start date
eg
#DATES                Department         Start Date           End Date         Emp_ID
10/01/2013            A                        10/01/2013          13/01/2013      1234
11/01/2013           NULL                      NULL                                           NULL  
12/01/2013           NULL                      NULL                                           NULL  
13/01/2013           NULL                      NULL                                           NULL  
14/01/2013           NULL                      NULL                                           NULL  
etc etc
What I would like is ..
#DATES               Department         Start Date           End Date         Emp_ID
10/01/2013            A                        10/01/2013          13/01/2013      1234
11/01/2013            A                        10/01/2013          13/01/2013      1234
12/01/2013            A                        10/01/2013          13/01/2013      1234
13/01/2013            A                        10/01/2013          13/01/2013      1234
14/01/2013           NULL                      NULL                                           NULL  

My sql so far would be like ..
create table #dates
(
CustomMonth char(6),
lookupdate DATETIME
)
declare @date_increment DATETIME
--change start date here
set @date_increment = '2000-12-31'
--change end date here
while @date_increment < '2013-12-31'
begin
set @date_increment = DATEADD(day,1,@date_increment)
insert #dates (CustomMonth, lookupdate)
select convert(char(4),(datepart(year,@date_increment)))    
+ RIGHT('0' + CONVERT(VARCHAR(2), DATEPART(MM, @date_increment)), 2),
@date_increment
end

select
T1.lookupdate ,
MYDATA.DEPARTMENT,
MYDATA.START_DATE,
MYDATA.EMP_ID

FROM #dates T1
LEFT JOIN
(
SELECT
EMP_ID,
DEPARTMENT,
START_DATE,
END_DATE
FROM MYTABLE
)SD ON T1.lookupdate = SD.START_DATE

ORDER BY 1,2,3

drop table #dates

Regards
SOLUTION
Avatar of David Todd
David Todd
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, there is a numbers table already built into SQL Server it is spt_values

There is a group of integer numbers that goes from 0 up to 2048

of course a dedicated numbers table can provide optimisation by indexing, and there is the CTE too where dates are computed using the date functions. You could also make use of a Calendar Table which can include or exclude or highlight significant days (like public holidays or weekends and so forth).

but just to give more options for you...

First up, lets assume a table of employees with their start date, department etc for the purposes of test, I have created a temp table #ee_employee_table as below

create table #ee_employee_table (empID int, Department varchar(10), StartDate datetime, EndDate datetime)

insert #ee_employee_table values (1234,'A','20130110','20130113')
insert #ee_employee_table values (1234,'B','20130113','20130115')

Open in new window


Now to get all the start and end dates (assuming 2048 days is enough) we can use a straight query

select n.number, dateadd(day,n.number,e.StartDate) as Date, e.department, e.startdate, e.enddate, e.empID
from #ee_employee_table E
inner join master..spt_values N on n.type = 'P' and e.startdate + n.number < e.enddate

Open in new window


But it would appear that you still have NULLs in your desired output which implies reporting on a calendar or periodic basis.

While we can easily modify the above query to work from a predefined period, this is when a calendar table can prove very handy...

So, let's create a calendar table...
-- this is a once off task, once created use (or abuse) as needed

create table Calendar ([Date] datetime primary key, [Day] int, [Month] int, [Year] int, [WorkDay] Char(1))

declare @d datetime = '20120101'
While @d < '20140101'
begin

   insert calendar
   select @d, day(@d), month(@d), year(@d), case when left(datename(dw,@d),1) <> 'S' then 'Y' else 'N' end 
   set @d = @d +1

end

-- and add any indexes as needed

create index idx_calendar_YM on calendar (Year, Month)

Open in new window


Now the above calendar is very simple, and could add week, or holidays etc....
jimhorn wrote an interesting article : https://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/A_12267-Date-Fun-Part-One-Build-your-own-SQL-calendar-table-to-perform-complex-date-expressions.html

So, now with our calendar table, selecting for a period (in this case month = 01 or could use dates, years whatever...) we now have the query :

select c.date, e.empid, e.department, e.startdate, e.enddate
from calendar c
left join #ee_employee_table e on c.date >= e.startdate and c.date < e.enddate
where c.year = 2013 and c.month = 01

Open in new window


now, the CTE approach is very fast and use it all the time. But there is one significant difference, and that's when you need to take into account things like public holidays, working days and other business oriented date factors... Easy to then add in "workday = 'y'" into the above query.

And there are no date functions or computations in sight...

With regard spt_values, I have written an Article that helps explain : https://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/A_1221-Fun-with-MS-SQL-spt-values-for-delimited-strings-and-virtual-calendars.html
An inline CTE will very likely provide the best performance, and it's straight-forward (once you get familiar with the technique), so I'll use it.  You could also use your own tally/numbers table if you prefer.  But, under no circumstances should you ever use master.dbo.spt_values, because MS could drop or alter that table in any future release, service pack, or hotfix with no notification to you.

The main query I suggest is this:
SELECT
    Dates.Date,
    mt.*
FROM cteTally t
CROSS APPLY (
    SELECT DATEADD(DAY, t.tally, @start_date) AS Date
) AS Dates
LEFT OUTER JOIN #MYTABLE mt ON
    NOT (mt.START_DATE > Dates.Date OR mt.END_DATE < Dates.Date)
WHERE
    Dates.Date <= @end_date
ORDER BY
    Dates.Date


Here it is with the entire CTE, test data build, etc.:

IF OBJECT_ID('tempdb..#MYTABLE') IS NOT NULL
    DROP TABLE #MYTABLE
CREATE TABLE #MYTABLE (
    EMP_ID int,
    DEPARTMENT varchar(30),
    START_DATE datetime,
    END_DATE datetime
)
INSERT INTO #MYTABLE VALUES(1234, 'A', '20130110', '20130113')
INSERT INTO #MYTABLE VALUES(1234, 'B', '20130114', '20130121')
--SELECT * FROM #MYTABLE

DECLARE @start_date datetime
DECLARE @end_date datetime
SET @start_date = '20121201'
SET @end_date = '20140131'

;WITH
cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS tally
    FROM cteDigits [1s]
    CROSS JOIN cteDigits [10s]
    CROSS JOIN cteDigits [100s]
    CROSS JOIN cteDigits [1000s]
)
SELECT
    Dates.Date,
    mt.*
FROM cteTally t
CROSS APPLY (
    SELECT DATEADD(DAY, t.tally, @start_date) AS Date
) AS Dates
LEFT OUTER JOIN #MYTABLE mt ON
    NOT (mt.START_DATE > Dates.Date OR mt.END_DATE < Dates.Date)
WHERE
    Dates.Date <= @end_date
ORDER BY
    Dates.Date
Give it a break Scott...

You seem to be stalking me. First the derogatory responses in a different thread, and now spt_values. Yes, it could be deprecated, but at the moment, there is NO discussion along those lines and it is heavily embedded in the bowels of SQL Server.

I also said the CTE is very fast, and Sharath has already posted a CTE solution. So, thought there was no real point posting it again...

But you did, and looking at it, why the heck would you stuff around with a tally table when all you have to do is add a few days together ?

Lets compare CTE performance then...  Building a tally table dynamically, or using mine by simply using dates as Sharath had already posted (and no real difference to his, just a bit easier to read).

/*
-- modified below to use my #ee_employee_table which now looks like

1234	A	2013-01-10 00:00:00.000	2013-01-13 00:00:00.000
1234	B	2013-01-13 00:00:00.000	2013-01-15 00:00:00.000
2238	B	2013-01-01 00:00:00.000	2013-01-15 00:00:00.000
2238	C	2013-01-15 00:00:00.000	2013-01-16 00:00:00.000  -- how do we show 1 day ?   
2238	D	2013-01-16 00:00:00.000	2013-01-20 00:00:00.000

-- note end dates for changing departments - maybe wrong 
-- but exemplifies a single day scenario and what the code has to do...

-- could be inclusive ie    between @start_date and @end_date
-- could be exclusive ie    >= @start_date and < @end_date

*/

-- common code setting date parameters as boundaries for reporting 

DECLARE @start_date datetime
DECLARE @end_date datetime
SET @start_date = '20130101'
SET @end_date = '20140211'

-- my CTE much the same a sharath's 

;with CTE_all_dates as 

( select @start_date Dates
  union all
  select Dates + 1
  from CTE_all_dates 
  where Dates < @end_date)

select D.Dates, E.*
from CTE_all_dates D
left join #ee_employee_table E on D.Dates between E.StartDate and E.EndDate 
option (maxrecursion 0)

-- Scott's CTE

;WITH
cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS tally
    FROM cteDigits [1s]
    CROSS JOIN cteDigits [10s]
    CROSS JOIN cteDigits [100s]
    CROSS JOIN cteDigits [1000s]
)
SELECT 
    Dates.Date,
    mt.*
FROM cteTally t
CROSS APPLY (
    SELECT DATEADD(DAY, t.tally, @start_date) AS Date
) AS Dates
LEFT OUTER JOIN #ee_employee_table mt ON
    NOT (mt.STARTDATE > Dates.Date OR mt.ENDDATE < Dates.Date)
WHERE
    Dates.Date <= @end_date
ORDER BY
    Dates.Date

Open in new window


Actual Execution plan - mine 1%, Scotts 99%

And I have to say, your code with embedded / cascading CTE's does look very complex. And not really the best intro in how to use a CTE (but that is simply my opinion)

And the real reason for Calendar Table is so you can take into account business oriented demands such as working days, public holidays etc...

So, if I add in to the mix of the above SQL code
select c.date, e.empid, e.department, e.startdate, e.enddate
from calendar c
left join #ee_employee_table e on c.date between e.startdate and e.enddate
where c.Date between @start_date and @end_date and c.WorkDay = 'Y'

Open in new window


Then it gets 2%, my CTE gets 1% and the remainder 97% with the added advantage that it only shows working days.

Also worth noting is the treatment of END DATE whether that should be included or excluded depending on how a single day is to be shown... ie is end date the day no longer working or the last working day within the department. Maybe need to consider TIME as well for greater accuracy and less confusion, and easy to incorporate into Calendar table by adding in start time and end time (i.e. reflecting 9am-5pm)
I know the %s are not always accurate (I've seen extreme examples of that), and I believe that is true in this case also.  The recursive CTE shows two lazy spools in the plan, which are typically relatively very expensive operations.  [I believe it was Itzik Ben-Gan who initially pointed out the performance gains from using CROSS JOINs to generate numbers over other methods.]

Running the above code with:
SET STATISTICS IO ON
SET STATISTICS TIME ON
and removing references to the #ee_employee_table seems to demonstrate that.


As to spt_values, if you're saying there's any justification for using it in any user code, we 100% disagree, regardless of what MS does/doesn't do.  The facts are: (1) MS could remove/adjust it any time and (2) it's limited to values 0-2047 (and only if you remember to limit it to "type = 'P'"): that could easily be overlooked later and cause serious result error(s) later.

Given all that, I think it's poor, reckless advice to suggest using spt_values in any user code or to use it in sample code to be copied.
Hi Guys,

Given most askers can't differentiate between excel, access, and SQL, I've found its wise to ask or get the asker to confirm which version of SQL they are on.

Maybe I live in the dim dark past, but the generic solution I suggested will work on most SQL's still in existence in the wild. I know, scary that SQL 7 and SQL 2000 are still in production. That is, a CTE mightn't be a possible solution for this asker. I find that I'm still coding against the lowest common system I support, hence the comment about the dim dark past. That is, generally, most of my code will run against SQL 2000.

There are endless ways to produce a numbers table, and even if I do mine the hard way with temp tables, it still tends to outperform most of the while and cursor loops it replaces.

And back to my comment that the generic solution I outlined will work and satisfy the askers requirements, have you really added anything much to the conversation?

Regards
  David
dtodd::

One: I don't believe the simple WHERE condition:
"
where
    dateadd( day, n.n, @StartDate ) <= @FinishDate
"
will satisfy the requirements.

Two: So you also recommend that spt_values be used as a numbers table?  Isn't not objecting an implicit recommendation, given that you suggested using a numbers table but didn't provide code to create one?!
Hi David,

Thought the version was a reasonably safe assumption because it was posted in the SQL 2008 TA, but you are correct, we should verify, it wouldn't be the first time...

Scott,

The thrust of my response was to use a Calendar Table.
The Article does mention some hazards of using spt_values.
Yes, I am very familiar with the dynamic tally table and all the various implications.

Yes, I disagree with some of your comments and some of your T-SQL work, but I don't use words like "poor" and "reckless" to call you out...

But if your desire for points is so very great, that you copy the essence of Sharath's code without acknowledging the prior experts participation by introducing the CTE originally, and add what ever disparaging comment to mine, along with having to have the last word on David's above, then go for it sunshine. But get your facts right first.
Why are you personalizing this?  I spoke about the code only.  I do believe it is poor and reckless coding to use and recommend unreliable elements in coding when simpler, faster and 100% safe methods are available.

Please explain to me/us:
Why encourage people to use spt_values?  Is there any genuine benefit from using it?

I don't need the points.  I answer what I'm interested in and to provide better information, aka "best practices", when I can.  

I clearly did not copy Sharath's code, as he used a recursive method whereas I used CROSS JOINs.  Moreover, the CROSS JOIN method is well known and thus doesn't need copied from any recent code.

I don't believe a calendar table is necessary for this and most other date-related calcs.  It is thus unnecessary overhead.  We disagree on that, as you clearly believe it simplifies tasks and therefore the overhead is worth it.  The askers can ultimately decide, but they should see other available options so they can make an informed choice.  The performance hit of calendar table joins/lookups on 100M+ row tables can get very large indeed, particularly if the code is less than optimal.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PHIL Sawyer

ASKER

Thanks everybody