Solved

FILL IN MISSING DATES

Posted on 2013-11-27
13
315 Views
Last Modified: 2013-12-02
Hello
I have created a table of dates (daily) and then joined to my data where it returns employees and their start date in a given department and End Date of Employment. Note: An employee could move departments and then have a new start date
eg
#DATES                Department         Start Date           End Date         Emp_ID
10/01/2013            A                        10/01/2013          13/01/2013      1234
11/01/2013           NULL                      NULL                                           NULL  
12/01/2013           NULL                      NULL                                           NULL  
13/01/2013           NULL                      NULL                                           NULL  
14/01/2013           NULL                      NULL                                           NULL  
etc etc
What I would like is ..
#DATES               Department         Start Date           End Date         Emp_ID
10/01/2013            A                        10/01/2013          13/01/2013      1234
11/01/2013            A                        10/01/2013          13/01/2013      1234
12/01/2013            A                        10/01/2013          13/01/2013      1234
13/01/2013            A                        10/01/2013          13/01/2013      1234
14/01/2013           NULL                      NULL                                           NULL  

My sql so far would be like ..
create table #dates
(
CustomMonth char(6),
lookupdate DATETIME
)
declare @date_increment DATETIME
--change start date here
set @date_increment = '2000-12-31'
--change end date here
while @date_increment < '2013-12-31'
begin
set @date_increment = DATEADD(day,1,@date_increment)
insert #dates (CustomMonth, lookupdate)
select convert(char(4),(datepart(year,@date_increment)))    
+ RIGHT('0' + CONVERT(VARCHAR(2), DATEPART(MM, @date_increment)), 2),
@date_increment
end

select
T1.lookupdate ,
MYDATA.DEPARTMENT,
MYDATA.START_DATE,
MYDATA.EMP_ID

FROM #dates T1
LEFT JOIN
(
SELECT
EMP_ID,
DEPARTMENT,
START_DATE,
END_DATE
FROM MYTABLE
)SD ON T1.lookupdate = SD.START_DATE

ORDER BY 1,2,3

drop table #dates

Regards
0
Comment
Question by:PHIL Sawyer
  • 5
  • 4
  • 2
  • +2
13 Comments
 
LVL 35

Assisted Solution

by:David Todd
David Todd earned 50 total points
Comment Utility
Hi,

Its fairly easy of it is just dates and not datetimes, although that is merely a matter of scale.

You need to create a numbers table or a tally table, which has the integers from 0 to a large number. This is a common design pattern or solution with a number of good examples on the web so I'm not going into much detail here.

Then with the startdate and enddate in variables the dates involved become

select
    dateadd( day, n.n, @StartDate ) as SomeDate
from dbo.Numbers n
where
    dateadd( day, n.n, @StartDate ) <= @FinishDate
;

Maybe then wrap the above as a derived table, and then left join the above to it, and wrap a coalesce or isnull around the date ie isnull( otherstuff.date, NumbersTableDate.SomeDate )

HTH
  David

PS Numbers tables are commonly used to avoid cursors and loops as you are doing above.
0
 
LVL 40

Assisted Solution

by:Sharath
Sharath earned 50 total points
Comment Utility
You can try this with recursive CTE instead of WHILE loop.
;with CTE as (
select CONVERT(DATE,'2013-01-01') StartDate,
       CONVERT(DATE,'2013-01-31') EndDate union all
select dateadd(d,1,StartDate),EndDate from CTE where dateadd(d,1,StartDate) <= EndDate)
select t1.StartDate Dates,
       t2.*
  from CTE t1
  left join Test t2
    on t1.StartDate between t2.Start_Date and t2.End_Date

Open in new window


You can change the Start and End Dates in the above CTE as per your requirement.

http://sqlfiddle.com/#!3/bd2a0/3
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Well, there is a numbers table already built into SQL Server it is spt_values

There is a group of integer numbers that goes from 0 up to 2048

of course a dedicated numbers table can provide optimisation by indexing, and there is the CTE too where dates are computed using the date functions. You could also make use of a Calendar Table which can include or exclude or highlight significant days (like public holidays or weekends and so forth).

but just to give more options for you...

First up, lets assume a table of employees with their start date, department etc for the purposes of test, I have created a temp table #ee_employee_table as below

create table #ee_employee_table (empID int, Department varchar(10), StartDate datetime, EndDate datetime)

insert #ee_employee_table values (1234,'A','20130110','20130113')
insert #ee_employee_table values (1234,'B','20130113','20130115')

Open in new window


Now to get all the start and end dates (assuming 2048 days is enough) we can use a straight query

select n.number, dateadd(day,n.number,e.StartDate) as Date, e.department, e.startdate, e.enddate, e.empID
from #ee_employee_table E
inner join master..spt_values N on n.type = 'P' and e.startdate + n.number < e.enddate

Open in new window


But it would appear that you still have NULLs in your desired output which implies reporting on a calendar or periodic basis.

While we can easily modify the above query to work from a predefined period, this is when a calendar table can prove very handy...

So, let's create a calendar table...
-- this is a once off task, once created use (or abuse) as needed

create table Calendar ([Date] datetime primary key, [Day] int, [Month] int, [Year] int, [WorkDay] Char(1))

declare @d datetime = '20120101'
While @d < '20140101'
begin

   insert calendar
   select @d, day(@d), month(@d), year(@d), case when left(datename(dw,@d),1) <> 'S' then 'Y' else 'N' end 
   set @d = @d +1

end

-- and add any indexes as needed

create index idx_calendar_YM on calendar (Year, Month)

Open in new window


Now the above calendar is very simple, and could add week, or holidays etc....
jimhorn wrote an interesting article : http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/A_12267-Date-Fun-Part-One-Build-your-own-SQL-calendar-table-to-perform-complex-date-expressions.html

So, now with our calendar table, selecting for a period (in this case month = 01 or could use dates, years whatever...) we now have the query :

select c.date, e.empid, e.department, e.startdate, e.enddate
from calendar c
left join #ee_employee_table e on c.date >= e.startdate and c.date < e.enddate
where c.year = 2013 and c.month = 01

Open in new window


now, the CTE approach is very fast and use it all the time. But there is one significant difference, and that's when you need to take into account things like public holidays, working days and other business oriented date factors... Easy to then add in "workday = 'y'" into the above query.

And there are no date functions or computations in sight...

With regard spt_values, I have written an Article that helps explain : http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/A_1221-Fun-with-MS-SQL-spt-values-for-delimited-strings-and-virtual-calendars.html
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
An inline CTE will very likely provide the best performance, and it's straight-forward (once you get familiar with the technique), so I'll use it.  You could also use your own tally/numbers table if you prefer.  But, under no circumstances should you ever use master.dbo.spt_values, because MS could drop or alter that table in any future release, service pack, or hotfix with no notification to you.

The main query I suggest is this:
SELECT
    Dates.Date,
    mt.*
FROM cteTally t
CROSS APPLY (
    SELECT DATEADD(DAY, t.tally, @start_date) AS Date
) AS Dates
LEFT OUTER JOIN #MYTABLE mt ON
    NOT (mt.START_DATE > Dates.Date OR mt.END_DATE < Dates.Date)
WHERE
    Dates.Date <= @end_date
ORDER BY
    Dates.Date


Here it is with the entire CTE, test data build, etc.:

IF OBJECT_ID('tempdb..#MYTABLE') IS NOT NULL
    DROP TABLE #MYTABLE
CREATE TABLE #MYTABLE (
    EMP_ID int,
    DEPARTMENT varchar(30),
    START_DATE datetime,
    END_DATE datetime
)
INSERT INTO #MYTABLE VALUES(1234, 'A', '20130110', '20130113')
INSERT INTO #MYTABLE VALUES(1234, 'B', '20130114', '20130121')
--SELECT * FROM #MYTABLE

DECLARE @start_date datetime
DECLARE @end_date datetime
SET @start_date = '20121201'
SET @end_date = '20140131'

;WITH
cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS tally
    FROM cteDigits [1s]
    CROSS JOIN cteDigits [10s]
    CROSS JOIN cteDigits [100s]
    CROSS JOIN cteDigits [1000s]
)
SELECT
    Dates.Date,
    mt.*
FROM cteTally t
CROSS APPLY (
    SELECT DATEADD(DAY, t.tally, @start_date) AS Date
) AS Dates
LEFT OUTER JOIN #MYTABLE mt ON
    NOT (mt.START_DATE > Dates.Date OR mt.END_DATE < Dates.Date)
WHERE
    Dates.Date <= @end_date
ORDER BY
    Dates.Date
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Give it a break Scott...

You seem to be stalking me. First the derogatory responses in a different thread, and now spt_values. Yes, it could be deprecated, but at the moment, there is NO discussion along those lines and it is heavily embedded in the bowels of SQL Server.

I also said the CTE is very fast, and Sharath has already posted a CTE solution. So, thought there was no real point posting it again...

But you did, and looking at it, why the heck would you stuff around with a tally table when all you have to do is add a few days together ?

Lets compare CTE performance then...  Building a tally table dynamically, or using mine by simply using dates as Sharath had already posted (and no real difference to his, just a bit easier to read).

/*
-- modified below to use my #ee_employee_table which now looks like

1234	A	2013-01-10 00:00:00.000	2013-01-13 00:00:00.000
1234	B	2013-01-13 00:00:00.000	2013-01-15 00:00:00.000
2238	B	2013-01-01 00:00:00.000	2013-01-15 00:00:00.000
2238	C	2013-01-15 00:00:00.000	2013-01-16 00:00:00.000  -- how do we show 1 day ?   
2238	D	2013-01-16 00:00:00.000	2013-01-20 00:00:00.000

-- note end dates for changing departments - maybe wrong 
-- but exemplifies a single day scenario and what the code has to do...

-- could be inclusive ie    between @start_date and @end_date
-- could be exclusive ie    >= @start_date and < @end_date

*/

-- common code setting date parameters as boundaries for reporting 

DECLARE @start_date datetime
DECLARE @end_date datetime
SET @start_date = '20130101'
SET @end_date = '20140211'

-- my CTE much the same a sharath's 

;with CTE_all_dates as 

( select @start_date Dates
  union all
  select Dates + 1
  from CTE_all_dates 
  where Dates < @end_date)

select D.Dates, E.*
from CTE_all_dates D
left join #ee_employee_table E on D.Dates between E.StartDate and E.EndDate 
option (maxrecursion 0)

-- Scott's CTE

;WITH
cteDigits AS (
    SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
    SELECT [1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS tally
    FROM cteDigits [1s]
    CROSS JOIN cteDigits [10s]
    CROSS JOIN cteDigits [100s]
    CROSS JOIN cteDigits [1000s]
)
SELECT 
    Dates.Date,
    mt.*
FROM cteTally t
CROSS APPLY (
    SELECT DATEADD(DAY, t.tally, @start_date) AS Date
) AS Dates
LEFT OUTER JOIN #ee_employee_table mt ON
    NOT (mt.STARTDATE > Dates.Date OR mt.ENDDATE < Dates.Date)
WHERE
    Dates.Date <= @end_date
ORDER BY
    Dates.Date

Open in new window


Actual Execution plan - mine 1%, Scotts 99%

And I have to say, your code with embedded / cascading CTE's does look very complex. And not really the best intro in how to use a CTE (but that is simply my opinion)

And the real reason for Calendar Table is so you can take into account business oriented demands such as working days, public holidays etc...

So, if I add in to the mix of the above SQL code
select c.date, e.empid, e.department, e.startdate, e.enddate
from calendar c
left join #ee_employee_table e on c.date between e.startdate and e.enddate
where c.Date between @start_date and @end_date and c.WorkDay = 'Y'

Open in new window


Then it gets 2%, my CTE gets 1% and the remainder 97% with the added advantage that it only shows working days.

Also worth noting is the treatment of END DATE whether that should be included or excluded depending on how a single day is to be shown... ie is end date the day no longer working or the last working day within the department. Maybe need to consider TIME as well for greater accuracy and less confusion, and easy to incorporate into Calendar table by adding in start time and end time (i.e. reflecting 9am-5pm)
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
I know the %s are not always accurate (I've seen extreme examples of that), and I believe that is true in this case also.  The recursive CTE shows two lazy spools in the plan, which are typically relatively very expensive operations.  [I believe it was Itzik Ben-Gan who initially pointed out the performance gains from using CROSS JOINs to generate numbers over other methods.]

Running the above code with:
SET STATISTICS IO ON
SET STATISTICS TIME ON
and removing references to the #ee_employee_table seems to demonstrate that.


As to spt_values, if you're saying there's any justification for using it in any user code, we 100% disagree, regardless of what MS does/doesn't do.  The facts are: (1) MS could remove/adjust it any time and (2) it's limited to values 0-2047 (and only if you remember to limit it to "type = 'P'"): that could easily be overlooked later and cause serious result error(s) later.

Given all that, I think it's poor, reckless advice to suggest using spt_values in any user code or to use it in sample code to be copied.
0
Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

 
LVL 35

Expert Comment

by:David Todd
Comment Utility
Hi Guys,

Given most askers can't differentiate between excel, access, and SQL, I've found its wise to ask or get the asker to confirm which version of SQL they are on.

Maybe I live in the dim dark past, but the generic solution I suggested will work on most SQL's still in existence in the wild. I know, scary that SQL 7 and SQL 2000 are still in production. That is, a CTE mightn't be a possible solution for this asker. I find that I'm still coding against the lowest common system I support, hence the comment about the dim dark past. That is, generally, most of my code will run against SQL 2000.

There are endless ways to produce a numbers table, and even if I do mine the hard way with temp tables, it still tends to outperform most of the while and cursor loops it replaces.

And back to my comment that the generic solution I outlined will work and satisfy the askers requirements, have you really added anything much to the conversation?

Regards
  David
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
dtodd::

One: I don't believe the simple WHERE condition:
"
where
    dateadd( day, n.n, @StartDate ) <= @FinishDate
"
will satisfy the requirements.

Two: So you also recommend that spt_values be used as a numbers table?  Isn't not objecting an implicit recommendation, given that you suggested using a numbers table but didn't provide code to create one?!
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Hi David,

Thought the version was a reasonably safe assumption because it was posted in the SQL 2008 TA, but you are correct, we should verify, it wouldn't be the first time...

Scott,

The thrust of my response was to use a Calendar Table.
The Article does mention some hazards of using spt_values.
Yes, I am very familiar with the dynamic tally table and all the various implications.

Yes, I disagree with some of your comments and some of your T-SQL work, but I don't use words like "poor" and "reckless" to call you out...

But if your desire for points is so very great, that you copy the essence of Sharath's code without acknowledging the prior experts participation by introducing the CTE originally, and add what ever disparaging comment to mine, along with having to have the last word on David's above, then go for it sunshine. But get your facts right first.
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Why are you personalizing this?  I spoke about the code only.  I do believe it is poor and reckless coding to use and recommend unreliable elements in coding when simpler, faster and 100% safe methods are available.

Please explain to me/us:
Why encourage people to use spt_values?  Is there any genuine benefit from using it?

I don't need the points.  I answer what I'm interested in and to provide better information, aka "best practices", when I can.  

I clearly did not copy Sharath's code, as he used a recursive method whereas I used CROSS JOINs.  Moreover, the CROSS JOIN method is well known and thus doesn't need copied from any recent code.

I don't believe a calendar table is necessary for this and most other date-related calcs.  It is thus unnecessary overhead.  We disagree on that, as you clearly believe it simplifies tasks and therefore the overhead is worth it.  The askers can ultimately decide, but they should see other available options so they can make an informed choice.  The performance hit of calendar table joins/lookups on 100M+ row tables can get very large indeed, particularly if the code is less than optimal.
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 200 total points
Comment Utility
REWRITE of my original code, to remove the NOT from the WHERE:

LEFT OUTER JOIN #MYTABLE mt ON
    mt.START_DATE <= Dates.Date AND mt.END_DATE >= Dates.Date
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 200 total points
Comment Utility
@philipsivyer

Excuse the slightly off topic rhetoric, we have your best interests at heart.

The only real problem with getting all the rows within your existing query is in the JOIN

-- You have :  
) SD ON T1.lookupdate = SD.Start_Date

-- and it needs to be :
) SD ON T1.lookupdate between SD.Start_Date and sd.End_Date

Open in new window

And also assume there is a date period to select from in a "where" clause somewhere, which might give reasons as to the different table aliases in use. Oh, and there are some who would say beware of "between".

Now to explain some of the rhetoric...

1) your #Dates table

#Dates could become a permanent table and when there is a business requirement for Dates (as opposed to a simple list of contiguous calendar days), then most definitely make it permanent. Given the current query, I would avoid creating every time.

2) your current query does have an example of a potential CTE

The subquery "SD" could also be expressed as a CTE. While it might not be the best use of a CTE, it might help understand what a CTE is. Simplistically the CTE is a named subquery. It really means Common Table Expression where you can name a result set and use that name elsewhere in your query. It starts with a "WITH" and finishes with a "SELECT" (or update, delete, insert, merge). The "WITH" has to be the first command within a batch so it is often prefixed with a semi-colon.
-- your current query from the question header

select 
T1.lookupdate ,
DEPARTMENT,
START_DATE,
EMP_ID

FROM #dates T1
LEFT JOIN
(
SELECT 
EMP_ID,
DEPARTMENT,
START_DATE,
END_DATE
FROM #ee_employee_table
)SD ON T1.lookupdate = SD.START_DATE

ORDER BY 1,2,3

-- now that same query expressed as a CTE

;with SD as
(
  SELECT 
  EMP_ID,
  DEPARTMENT,
  START_DATE,
  END_DATE
  FROM #ee_employee_table
)
select 
  T1.lookupdate ,
  DEPARTMENT,
  START_DATE,
  EMP_ID

FROM #dates T1
LEFT JOIN SD ON T1.lookupdate = SD.START_DATE
ORDER BY 1,2,3

Open in new window


Probably not the best example for creating a CTE, but once you start using CTE's you tend to think of them before you think of subqueries. You can read up on them : http://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx and follow the links...

3) Using a Permanent Calendar Table

If you have the business requirement to somehow express or derive dates in terms of a business view (such as Working Days, Public Holidays, Custom Periods, 5-4-4 quarters, Weeks, 9:00am - 5:00pm working day, Month definitions etc) then Create that table and use it. It should provide a single source for all your date permutations rather than having a myriad of converts or date handling methods and procedures/functions throughout multiple sources.

4) Using a Numbers Table

Using a Tally table or Numbers table as Dtodd first suggested can be an extraordinary handy thing to have access to. I do recommend you read my spt_values article, it shows a couple of examples where numbers can be handy. It does express some of the traps that ScottPletcher mentioned above, but think he reacted to spt_values without first reading the Article.

5) Using a CTE

Which is the way a lot of us slightly more experienced  T-SQL coders now think and first presented by Sharath_123 above. Being able to name a result set, then use that is amazingly powerful. One of the more obvious uses is when you have a derived or computed value and subsequently need to use that value by name without having to reconstruct it. But, once you start to explore, you will find a lot of uses including recursion where you can use the CTE within its definition. As Dtodd rightly points out, it was introduced in 2005, and if you are not accustomed to its use, it might be a little confronting at first. The example that ScottPletcher put forward shows how to build a tally "table" then use that to give you the results.

6) Your Site, Your Code

As Dtodd said earlier, sometimes we need to compromise our approaches. At the end of the day, it is your site and your code and while there might be a few different options, it must fit into what you do, are comfortable with maintaining and ultimately give you the results and performance you and your company are happy to live with given the constraints of having to deliver a working solution.    

 

7) Observations

7.1) The reason your current query doesn't give the results you want is because of the JOIN.
7.2) I would make #Dates a permanent table rather than recreate every time
7.3) It currently has procedural statements so it is being "performed" (exec, job, etc)
7.4) Given the code in the Question header, you don't really need a subquery.
7.5) Re the above (join + avoid subquery), they are exemplified in my Calendar Table query.
7.6) You could convert to a CTE as Sharath_123 and ScottPletcher have demonstrated
0
 

Author Closing Comment

by:PHIL Sawyer
Comment Utility
Thanks everybody
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

     When we have to pass multiple rows of data to SQL Server, the developers either have to send one row at a time or come up with other workarounds to meet requirements like using XML to pass data, which is complex and tedious to use. There is a …
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now