D B
asked on
Get earliest non-breaking date in query
I need to get the earliest non-breaking effective date by user from a list of effective/term dates. The data has a user id, effective date, term date and department, group and active indicator. A new row is created each year and typically has an effective date of the start of that year and a term date of the last day of the year, but not always.
The dates are contiguous in that, if there is more than one row for a user, the subsequent row's EffDate is the previous row's TermDate + 1.
If there is a 'break' in employment, there will be a single row with ActiveInd = 'I' with a date range from the day after their term date until the day before they are re-employed.
A 'break' is considered either ActiveInd going from A to I or a change in Department or Group. In the example I've attached, the Effective Date I want to return is 2015-01-01 because the earliest EffDt for Department MEH1 (without a break) is 2015-01-01. Until 2014-12-31 the Department was MEP1.
If the last row has a value of I for ActiveInd, and there are no subsequent A rows, then treat the I as an A, and look for the earliest EffDt for that department and group. In the example, if there were only the first 6 rows, the EffDt I'd want is 2008-01-01, since they termed on 201-05-12 in Department MEH1 and Group F.
Also, the last row for any user always has an open-ended TermDt value of 2199-12-31, regardless of whether they are active or not.
One caveat, if the user has a future row, but are currently active, then only take into account their current department and group. For example, if the last row in the data below had a TermDate = 2020-02-29 and another row with EffDate = '2020-03-01', Department = MCD1, Group = 'C', DO NOT consider the last row as it is in the future. However, if the last row below had an ActiveInd value of 'I' and there was another A row in the future following it, then use the future date.
Basically, the precedence is:
1. Use earliest EffDt of current Department and Group.
2. If currently inactive and there is a future row, use the EffDt of the future row.
3. If currently inactive and no future row, use the earliest EffDt of the last Department and Group.
Current row is the row where the date this is executed is between EffDt and TermDt.
The dates are contiguous in that, if there is more than one row for a user, the subsequent row's EffDate is the previous row's TermDate + 1.
If there is a 'break' in employment, there will be a single row with ActiveInd = 'I' with a date range from the day after their term date until the day before they are re-employed.
A 'break' is considered either ActiveInd going from A to I or a change in Department or Group. In the example I've attached, the Effective Date I want to return is 2015-01-01 because the earliest EffDt for Department MEH1 (without a break) is 2015-01-01. Until 2014-12-31 the Department was MEP1.
If the last row has a value of I for ActiveInd, and there are no subsequent A rows, then treat the I as an A, and look for the earliest EffDt for that department and group. In the example, if there were only the first 6 rows, the EffDt I'd want is 2008-01-01, since they termed on 201-05-12 in Department MEH1 and Group F.
Also, the last row for any user always has an open-ended TermDt value of 2199-12-31, regardless of whether they are active or not.
One caveat, if the user has a future row, but are currently active, then only take into account their current department and group. For example, if the last row in the data below had a TermDate = 2020-02-29 and another row with EffDate = '2020-03-01', Department = MCD1, Group = 'C', DO NOT consider the last row as it is in the future. However, if the last row below had an ActiveInd value of 'I' and there was another A row in the future following it, then use the future date.
Basically, the precedence is:
1. Use earliest EffDt of current Department and Group.
2. If currently inactive and there is a future row, use the EffDt of the future row.
3. If currently inactive and no future row, use the earliest EffDt of the last Department and Group.
Current row is the row where the date this is executed is between EffDt and TermDt.
USER_ID EffDate TermDate Department Group ActiveInd
47834004 2006-01-01 2006-10-25 MEP1 D A
47834004 2006-10-26 2006-12-31 MEP1 D A
47834004 2007-01-01 2007-12-31 MEH1 D A
47834004 2008-01-01 2010-07-07 MEH1 F A
47834004 2010-07-08 2014-05-12 MEH1 F I
47834004 2014-05-13 2014-12-31 MEP1 E A
47834004 2015-01-01 2015-12-31 MEH1 D A
47834004 2016-01-01 2199-12-31 MEH1 D A
Your sample data only seems to provide an example of one case, precedence 1, although I think I can picture precedence 3 if the last three rows did not exist. Can you provide some data that includes all of the possibilities (plus the caveat) and the expected results? Also, what version of SQL Server are you using?
ASKER
SQL Server 2016.
004 - expect 1/1/2015 (earliest date of current department/group)
005 - expect 4/1/2020 (no current active so use future)
006 - expect 3/16/2020 (no current active even though there is history so use future)
007 - expect 1/1/2019 (not currently active and no future so use most recent department/group and get earliest EffDate for that combination). Even though this user had MEH1/C on 1/1/2017, they had MED1/A for 2018 then switched back to MEH1/C in 2019, so that is the earliest 'contiguous' effective date.
Hope this helps.
004 - expect 1/1/2015 (earliest date of current department/group)
005 - expect 4/1/2020 (no current active so use future)
006 - expect 3/16/2020 (no current active even though there is history so use future)
007 - expect 1/1/2019 (not currently active and no future so use most recent department/group and get earliest EffDate for that combination). Even though this user had MEH1/C on 1/1/2017, they had MED1/A for 2018 then switched back to MEH1/C in 2019, so that is the earliest 'contiguous' effective date.
Hope this helps.
USER_ID EffDate TermDate Dept Group ActiveInd
47834004 1/1/2006 10/25/2006 MEP1 D A
47834004 10/26/2006 12/31/2006 MEP1 D A
47834004 1/1/2007 12/31/2007 MEH1 D A
47834004 1/1/2008 7/7/2010 MEH1 F A
47834004 7/8/2010 5/12/2014 MEH1 F I
47834004 5/13/2014 12/31/2014 MEP1 E A
47834004 1/1/2015 12/31/2015 MEH1 D A
47834004 1/1/2016 12/31/2199 MEH1 D A
47834005 4/1/2020 12/31/2199 MED1 A A
47834006 1/1/2019 12/31/2019 MEB1 C A
47834006 1/1/2020 3/15/2020 MEH1 C I
47834006 3/16/2020 12/31/2199 MEH1 B A
47834007 1/1/2017 12/31/2017 MEH1 C A
47834007 1/1/2018 12/31/2018 MED1 A A
47834007 1/1/2019 12/31/2019 MEH1 C A
47834007 1/1/2020 1/27/2020 MEH1 C A
47834007 1/28/2020 12/31/2199 MEH1 C I
Hopefully you're using a version that supports the lead and lag functions.
select user_id, effdate from
(select user_id, effdate, termdate, dept, groupid, activeind
,row_number() over (partition by user_id, dept order by effdate desc) rn
,lag(dept) over (partition by user_id order by effdate desc) prevdept
,lead(dept) over (partition by user_id order by effdate desc) nextdept
from yourtbl
where activeind = 'A') x
where (dept = prevdept and dept != nextdept and termdate < sysdate and activeind = 'A' and rn in (1,2))
or (prevdept is null and nextdept is null)
or (dept != nextdept and prevdept is null and termdate > sysdate);
select user_id, effdate from
(select user_id, effdate, termdate, dept, groupid, activeind
,row_number() over (partition by user_id, dept order by effdate desc) rn
,lag(dept) over (partition by user_id order by effdate desc) prevdept
,lead(dept) over (partition by user_id order by effdate desc) nextdept
from yourtbl
where activeind = 'A') x
where (dept = prevdept and dept != nextdept and termdate < sysdate and activeind = 'A' and rn in (1,2))
or (prevdept is null and nextdept is null)
or (dept != nextdept and prevdept is null and termdate > sysdate);
Sorry, I was thinking Oracle. Substitute Oracle sysdate with SQL Server getdate()
ASKER
So, in this example, nothing is being returned.
73050 01/01/2001 01/12/2005 MFH1 A N
73050 01/01/2006 08/08/2006 MFH1 A Y
73050 08/09/2006 12/31/2199 MFH1 A N
It should return 1/1/2006 (the earliest effDt of the last dept/group if latest row is not active.
73050 01/01/2001 01/12/2005 MFH1 A N
73050 01/01/2006 08/08/2006 MFH1 A Y
73050 08/09/2006 12/31/2199 MFH1 A N
It should return 1/1/2006 (the earliest effDt of the last dept/group if latest row is not active.
In my query, I show where activeind = 'A' assuming it stood for active. Your latest example makes it appear that the choices are Y[es] or N[o]. So the subquery should have the filter "where activeind = 'Y', then those rows with activeinds of "N" are already eliminated.
ASKER
Sorry, A and I are correct. Replace in my comment above, Y with A and N with I.
So, with the exception of changing sysdate to getdate(), my earlier solution should work.
ASKER
Again, in this example, nothing is being returned.
USER_ID EffDate TermDate Dept Group ActiveInd
73050 01/01/2001 12/31/2005 MFH1 A I
73050 01/01/2006 08/08/2006 MFH1 A A
73050 08/09/2006 12/31/2199 MFH1 A I
It should return 1/1/2006 (the earliest effDt of the last dept/group if latest row is not active.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
awking00: Sorry I haven't gotten back to you for some time. I got pulled off this, then with the COVID-19 stuff and shuffling around, trying to set up a home-based office, etc., Hopefully you can understand.
I have marked your solution as the answer, even though it did not consistently return the expected results. If I get pulled back on this task, hopefully you have gotten me far enough along that I can tweak it to give consistent results. There appears to be an issue or two still unresolved.
I have marked your solution as the answer, even though it did not consistently return the expected results. If I get pulled back on this task, hopefully you have gotten me far enough along that I can tweak it to give consistent results. There appears to be an issue or two still unresolved.