SQL Recursive Query: Define Overall Start & Due Dates of a Group by its Children

Hello Experts,

I have a table (def & data attached), called Linked_Tasks, which has tasks with start and due dates (Start_Date, Due_Date), and those tasks belong to groups (defined by Task_Parent_ID). A group has a field Is_Group set to 'Y'. I need to write a query which sets the start date and due date of each group by the min start date and max due date, respectively, from the tasks in that group. I have a recursive query started, attached, but could use help in the best way to write that query to set the overall start/due dates of groups of tasks.

Any ideas?

Thank you!
Steve
USE [WSAProject]
GO
/****** Object:  Table [dbo].[Linked_Task]    Script Date: 10/12/2018 3:20:13 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Linked_Task](
	[Task_ID] [int] NOT NULL,
	[Task_Parent_ID] [int] NULL,
	[Linked_Task_ID] [int] NULL,
	[Linked_Task_Type] [char](2) NULL,
	[Is_Group] [char](1) NULL,
	[Task_Name] [varchar](200) NULL,
	[Start_Date] [date] NULL,
	[Due_Date] [date] NULL,
 CONSTRAINT [PK_Linked_Task] PRIMARY KEY CLUSTERED 
(
	[Task_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (1, 8, NULL, NULL, N'N', N'Task A', CAST(N'2018-10-01' AS Date), CAST(N'2018-10-10' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (2, 8, 1, N'SS', N'N', N'Task B', CAST(N'2018-10-05' AS Date), CAST(N'2018-10-16' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (3, 8, 2, N'FS', N'N', N'Task C', CAST(N'2018-10-16' AS Date), CAST(N'2018-10-19' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (4, 9, NULL, NULL, N'N', N'Task D', CAST(N'2018-10-15' AS Date), CAST(N'2018-10-18' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (5, 9, 4, N'FS', N'N', N'Task E', CAST(N'2018-10-18' AS Date), CAST(N'2018-10-25' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (6, 9, 5, N'SS', N'N', N'Task F', CAST(N'2018-10-18' AS Date), CAST(N'2018-10-25' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (7, 9, 6, N'FS', N'N', N'Task G', CAST(N'2018-10-25' AS Date), CAST(N'2018-11-01' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (8, NULL, NULL, NULL, N'Y', N'GROUP 1', NULL, NULL)
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (9, NULL, NULL, NULL, N'Y', N'GROUP 2', NULL, NULL)

Open in new window

USE [WSAProject]
GO
/****** Object:  StoredProcedure [dbo].[wsa_ProjectTask_s]    Script Date: 10/12/2018 3:20:35 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO



WITH Task_CTE AS (

	SELECT  
	CAST(right('000' + cast(row_number()over(partition by Task.Task_Parent_ID order by Task.Order_By, Task.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID IS NULL Or lt.Task_Parent_ID = 0)

UNION ALL

	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by t.Task_Parent_ID order by t.Order_By, t.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
) 

Open in new window

Linked_Task.sql
Linked_Task_Query.sql
tablaFreakAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PortletPaulEE Topic AdvisorCommented:
Is it a 2 tier hierarchy? If so you don't need recursion

select
*
from Linked_Task gt
inner join (
    select Task_Parent_ID, min(Start_Date) grp_Start_Date, max(Due_Date) grp_Due_Date
    from Linked_Task 
    where Is_Group = 'N'
    group by Task_Parent_ID
    ) t on gt.Task_ID = t.Task_Parent_ID
;

  Task_ID   Task_Parent_ID   Linked_Task_ID   Linked_Task_Type   Is_Group   Task_Name   Start_Date   Due_Date   Task_Parent_ID     grp_Start_Date         grp_Due_Date      
 --------- ---------------- ---------------- ------------------ ---------- ----------- ------------ ---------- ---------------- --------------------- --------------------- 
        8   NULL             NULL             NULL               Y          GROUP 1     NULL         NULL                    8   01.10.2018 00:00:00   19.10.2018 00:00:00  
        9   NULL             NULL             NULL               Y          GROUP 2     NULL         NULL                    9   15.10.2018 00:00:00   01.11.2018 00:00:00  

Open in new window

You could establish this as a view, and then join it to any set of tasks. e.g.

create view grp_task_v as
    select Task_Parent_ID, min(Start_Date) grp_Start_Date, max(Due_Date) grp_Due_Date
    from Linked_Task 
    where Is_Group = 'N'
    group by Task_Parent_ID
;

select
      v.grp_Start_Date
    , v.grp_Due_Date
    , t.*
from linked_task t
left join grp_task_v v on t.Task_ID = v.Task_Parent_ID or t.Task_Parent_ID  = v.Task_Parent_ID
;

Open in new window

which would look like this:
        grp_Start_Date         grp_Due_Date       Task_ID   Task_Parent_ID   Linked_Task_ID   Linked_Task_Type   Is_Group   Task_Name       Start_Date             Due_Date        
 --- --------------------- --------------------- --------- ---------------- ---------------- ------------------ ---------- ----------- --------------------- --------------------- 
  1   01.10.2018 00:00:00   19.10.2018 00:00:00         1   8                NULL             NULL               N          Task A      01.10.2018 00:00:00   10.10.2018 00:00:00  
  2   01.10.2018 00:00:00   19.10.2018 00:00:00         2   8                1                SS                 N          Task B      05.10.2018 00:00:00   16.10.2018 00:00:00  
  3   01.10.2018 00:00:00   19.10.2018 00:00:00         3   8                2                FS                 N          Task C      16.10.2018 00:00:00   19.10.2018 00:00:00  
  4   15.10.2018 00:00:00   01.11.2018 00:00:00         4   9                NULL             NULL               N          Task D      15.10.2018 00:00:00   18.10.2018 00:00:00  
  5   15.10.2018 00:00:00   01.11.2018 00:00:00         5   9                4                FS                 N          Task E      18.10.2018 00:00:00   25.10.2018 00:00:00  
  6   15.10.2018 00:00:00   01.11.2018 00:00:00         6   9                5                SS                 N          Task F      18.10.2018 00:00:00   25.10.2018 00:00:00  
  7   15.10.2018 00:00:00   01.11.2018 00:00:00         7   9                6                FS                 N          Task G      25.10.2018 00:00:00   01.11.2018 00:00:00  
  8   01.10.2018 00:00:00   19.10.2018 00:00:00         8   NULL             NULL             NULL               Y          GROUP 1     NULL                  NULL                 
  9   15.10.2018 00:00:00   01.11.2018 00:00:00         9   NULL             NULL             NULL               Y          GROUP 2     NULL                  NULL                 

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mark WillsTopic AdvisorCommented:
OK, while Paul is quite correct about it being a regular join given the hierarchical relationship, Bit more challenging if quite a few levels deep.... So, let's address your concerns about recursive cte...

Your query was very nearly finished. Just a few tweaks, an unknown column name, and it worked.

;WITH Task_CTE AS 
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID IS NULL Or lt.Task_Parent_ID = 0)

    UNION ALL

	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
) select * 
  from task_cte
  order by sort
--option (maxrecursion 0)

Open in new window

So your basic approach in making a recursive CTE was good. The common mistake is getting the root node correct. And you had that down pact.

However, Recursive CTE's can be very inefficient and resource hungry. They can end up looping the entire collection in the CTE every time it appends another row to the collection.

So, if ever possible, if there is another way, then you are better off avoiding recursive CTE.

Here are 3 queries to compare the costs using the same basic approach of the same CTE (well, starts the same):
;WITH Task_CTE AS  -- Query 1 
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID IS NULL Or lt.Task_Parent_ID = 0)

    UNION ALL

	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
) select * 
  from task_cte
  order by sort
--option (maxrecursion 0)

--Query 1: Query cost (relative to the batch): 57%


;WITH Task_CTE AS   -- Query 2
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID in (select distinct task_parent_id from Linked_Task))
)
	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
    order by 1

--Query 2: Query cost (relative to the batch): 18%


;WITH Task_CTE AS    -- Query 3
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID in (select distinct task_parent_id from Linked_Task))
)   select * from Task_CTE as t_cte union
	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
    order by 1

--Query 3: Query cost (relative to the batch): 25%

Open in new window

As you can see, Query 1 was more (cost) than the other two combined.

Query 2, you would have to add those 'root' nodes as addional columns. Much the same way Paul did above

Query 3, simply does a union to get the 'root' node appearing in the same columns...

And, there is very little difference in how the three CTE's were created - main difference is where you end the initial query.

Of course, if you do have more than a few levels deep, then the recursive CTE (albeit expensive) is a pretty good way of traversing the complete hierarchy.

Query 1 will still work with the addition of :

INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (13, 3, 2, N'FS', N'N', N'Task C 13', CAST(N'2018-10-16' AS Date), CAST(N'2018-10-19' AS Date))

Where as Query 2 and 3 will probably fail to fully report all levels correctly, and/or, in the correct and logical sequence.
tablaFreakAuthor Commented:
Thanks gentlemen, this was most helpful!
Best,
Steve
Mark WillsTopic AdvisorCommented:
Our pleasure :)
PortletPaulEE Topic AdvisorCommented:
Indeed, a pleasure. Cheers, Paul.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.