Paula DiTallo
asked on
t-sql: Keeping track of cumulative counts
Techies--
Here is a distilled version of the source data:
dept empNum dateHired dateTerm
10 100 3/1/2008
10 120 4/17/2008
10 122 4/18/2008 6/2/2011
10 138 5/7/2008
10 200 6/2/2011
10 201 6/2/2011
200 195 6/1/2011
200 196 6/1/2011
200 197 6/1/2011
200 207 6/3/2011
Here is what I am trying to achieve:
dept Date HowManyHired HowManyTerm ActiveEmpls
10 6/1/2011 0 0 4
10 6/2/2011 2 1 5
10 6/3/2011 0 0 5
20 6/1/2011 3 0 3
20 6/2/2011 0 0 3
20 6/3/2011 1 0 4
Basically, I need to keep a cumulative tally of the active employees for a given department--taking into account hirings and terminations.
How do I accomplish this?
Also, I'm looking for some reference material on writing SQL for reports--or in general, understanding the concepts behind more complex SQL. If you have any references in mind, I am a willing student!:-)
Here is a distilled version of the source data:
dept empNum dateHired dateTerm
10 100 3/1/2008
10 120 4/17/2008
10 122 4/18/2008 6/2/2011
10 138 5/7/2008
10 200 6/2/2011
10 201 6/2/2011
200 195 6/1/2011
200 196 6/1/2011
200 197 6/1/2011
200 207 6/3/2011
Here is what I am trying to achieve:
dept Date HowManyHired HowManyTerm ActiveEmpls
10 6/1/2011 0 0 4
10 6/2/2011 2 1 5
10 6/3/2011 0 0 5
20 6/1/2011 3 0 3
20 6/2/2011 0 0 3
20 6/3/2011 1 0 4
Basically, I need to keep a cumulative tally of the active employees for a given department--taking into account hirings and terminations.
How do I accomplish this?
Also, I'm looking for some reference material on writing SQL for reports--or in general, understanding the concepts behind more complex SQL. If you have any references in mind, I am a willing student!:-)
ASKER
Thanks Jogos-- the determining factor for active or non-active employees is the null value in the dateTerm. So if dateTerm is null, then that is an active employee if dateTerm is not null then that is an inactive employee. How would I work that into a case statement with a sum() function?
I see, it's including the other records
select x.dept, x.dateHired, sum(HowManyHired) as HowManyHired, sum(howmanyTerm) as howmanyTerm , sum(asActiveEmpls) as asActiveEmpls
from
select t1.dept , t1.dateHired ,count(*) as HowManyHired , 0 as HowManyTerm
, (select count(*)
from dbo.tableA as t2
where t2.dept = t1.dept
and t2.dateHired <= t1.dateHired
and (t2.dateTerm is null or t2.dateTerm >= t1.dateHired)
) as asActiveEmpls
from dbo.tableA as t1
group by t1.dept, t1.dateHired
union all
select dept, dateTerm, 0 as HowManyHired , count (*) as HowManyTerm, 0 asActiveEmpls
From dbo.tableA
where dateterm is not null
group by dept, dateHired
) as x
order by x.dept, x.dateHired
GROUP by x.dept, x.dateHire
Here is my take:
The first part just builds a temp table to test with (replace any references to it with your table).
Since we want to be able to (potentially) check status for all dates, not just dates with activities, we need to build a table to contain data to query off of, so we first build a numbers or tally table.
We use this to calculate a list of every date for every department (the CROSS JOIN). This gives us a table of every dept and every date in the range. Then we run through this table and check each date/dept pair for count, term, and additions. We could test for null for term date, or just use a date that is greater than the time frame of our query.
It is real handy to have a numbers table and a calendar table in your database for queries such as this.
DECLARE @T TABLE (DepID INT, EmpID INT, HireDate DATE, TermDate DATE)
INSERT INTO @T SELECT 10, 100, '3/1/2008',NULL
INSERT INTO @T SELECT 10, 120, '4/17/2008',NULL
INSERT INTO @T SELECT 10, 122, '4/18/2008', '6/2/2011'
INSERT INTO @T SELECT 10, 138, '5/7/2008',NULL
INSERT INTO @T SELECT 10 ,200 ,'6/2/2011',NULL
INSERT INTO @T SELECT 10, 201, '6/2/2011',NULL
INSERT INTO @T SELECT 200 ,195 ,'6/1/2011',NULL
INSERT INTO @T SELECT 200 ,196 ,'6/1/2011',NULL
INSERT INTO @T SELECT 200 ,197 ,'6/1/2011',NULL
INSERT INTO @T SELECT 200 ,207 ,'6/3/2011',NULL
DECLARE @StartDate DATE = '2011-06-01'
DECLARE @EndDate DATE = '2011-06-05'
;WITH --Generate a tally table from 0 to 9999
N1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
N2(N) AS (SELECT 1 FROM N1 a, N1 b),
N3(N) AS (SELECT 1 FROM N2 a, N2 b),
Numbers(N) AS (SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 FROM N3),
DL(CalDate, DeptID) AS (SELECT dateadd(day,Numbers.n,@StartDate), D.DepID
FROM Numbers
CROSS JOIN (SELECT DISTINCT DepID FROM @T) D
WHERE dateadd(day,Numbers.n,@StartDate) <= @EndDate
)
SELECT DL.DeptID, DL.CalDate, (SELECT count(*) FROM @T T WHERE T.HireDate= DL.CalDate AND T.DepID = DL.DeptID) AS Hired,
(SELECT count(*) FROM @T T WHERE T.TermDate= DL.CalDate) AS Termed,
(SELECT count(*) FROM @T T WHERE T.HireDate< DL.CalDate) AS Active
FROM dl
The first part just builds a temp table to test with (replace any references to it with your table).
Since we want to be able to (potentially) check status for all dates, not just dates with activities, we need to build a table to contain data to query off of, so we first build a numbers or tally table.
We use this to calculate a list of every date for every department (the CROSS JOIN). This gives us a table of every dept and every date in the range. Then we run through this table and check each date/dept pair for count, term, and additions. We could test for null for term date, or just use a date that is greater than the time frame of our query.
It is real handy to have a numbers table and a calendar table in your database for queries such as this.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Minor correction, changing one "<=" to "<":
[Won't even affect the results now, but it's technically more accurate, and in case you change the query later, it might be significant]
....
INSERT INTO @empl_chgs ( dept, date, base_active, how_many_hired, how_many_term )
SELECT
dept AS dept,
CASE WHEN action = 'Hire' THEN CASE WHEN dateHired < @startDate THEN '19000101' ELSE dateHired END --**** changed "<=" to "<" ****
ELSE dateTerm END AS date,
SUM(CASE WHEN dateHired <= @startDate THEN 1 ELSE 0 END) AS base_active,
SUM(CASE WHEN action = 'Hire' AND dateHired BETWEEN @startDate AND @endDate THEN 1 ELSE 0 END) AS HowManyHired,
SUM(CASE WHEN action = 'Term' AND dateTerm BETWEEN @startDate AND @endDate THEN 1 ELSE 0 END) AS HowManyTerm
FROM @employee_status
CROSS JOIN (
SELECT 'Hire' AS action UNION ALL SELECT 'Term'
) AS Actions
WHERE
dateHired <= @endDate AND
(action = 'Hire' OR (action = 'Term' AND dateTerm IS NOT NULL))
GROUP BY
dept,
CASE WHEN action = 'Hire' THEN CASE WHEN dateHired < @startDate THEN '19000101' ELSE dateHired END
ELSE dateTerm END
ORDER BY
dept, date
...
[Won't even affect the results now, but it's technically more accurate, and in case you change the query later, it might be significant]
....
INSERT INTO @empl_chgs ( dept, date, base_active, how_many_hired, how_many_term )
SELECT
dept AS dept,
CASE WHEN action = 'Hire' THEN CASE WHEN dateHired < @startDate THEN '19000101' ELSE dateHired END --**** changed "<=" to "<" ****
ELSE dateTerm END AS date,
SUM(CASE WHEN dateHired <= @startDate THEN 1 ELSE 0 END) AS base_active,
SUM(CASE WHEN action = 'Hire' AND dateHired BETWEEN @startDate AND @endDate THEN 1 ELSE 0 END) AS HowManyHired,
SUM(CASE WHEN action = 'Term' AND dateTerm BETWEEN @startDate AND @endDate THEN 1 ELSE 0 END) AS HowManyTerm
FROM @employee_status
CROSS JOIN (
SELECT 'Hire' AS action UNION ALL SELECT 'Term'
) AS Actions
WHERE
dateHired <= @endDate AND
(action = 'Hire' OR (action = 'Term' AND dateTerm IS NOT NULL))
GROUP BY
dept,
CASE WHEN action = 'Hire' THEN CASE WHEN dateHired < @startDate THEN '19000101' ELSE dateHired END
ELSE dateTerm END
ORDER BY
dept, date
...
Open in new window