Link to home
Start Free TrialLog in
Avatar of tablaFreak
tablaFreakFlag for United States of America

asked on

SQL Recursive Query: Define Overall Start & Due Dates of a Group by its Children

Hello Experts,

I have a table (def & data attached), called Linked_Tasks, which has tasks with start and due dates (Start_Date, Due_Date), and those tasks belong to groups (defined by Task_Parent_ID). A group has a field Is_Group set to 'Y'. I need to write a query which sets the start date and due date of each group by the min start date and max due date, respectively, from the tasks in that group. I have a recursive query started, attached, but could use help in the best way to write that query to set the overall start/due dates of groups of tasks.

Any ideas?

Thank you!
Steve
USE [WSAProject]
GO
/****** Object:  Table [dbo].[Linked_Task]    Script Date: 10/12/2018 3:20:13 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Linked_Task](
	[Task_ID] [int] NOT NULL,
	[Task_Parent_ID] [int] NULL,
	[Linked_Task_ID] [int] NULL,
	[Linked_Task_Type] [char](2) NULL,
	[Is_Group] [char](1) NULL,
	[Task_Name] [varchar](200) NULL,
	[Start_Date] [date] NULL,
	[Due_Date] [date] NULL,
 CONSTRAINT [PK_Linked_Task] PRIMARY KEY CLUSTERED 
(
	[Task_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (1, 8, NULL, NULL, N'N', N'Task A', CAST(N'2018-10-01' AS Date), CAST(N'2018-10-10' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (2, 8, 1, N'SS', N'N', N'Task B', CAST(N'2018-10-05' AS Date), CAST(N'2018-10-16' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (3, 8, 2, N'FS', N'N', N'Task C', CAST(N'2018-10-16' AS Date), CAST(N'2018-10-19' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (4, 9, NULL, NULL, N'N', N'Task D', CAST(N'2018-10-15' AS Date), CAST(N'2018-10-18' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (5, 9, 4, N'FS', N'N', N'Task E', CAST(N'2018-10-18' AS Date), CAST(N'2018-10-25' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (6, 9, 5, N'SS', N'N', N'Task F', CAST(N'2018-10-18' AS Date), CAST(N'2018-10-25' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (7, 9, 6, N'FS', N'N', N'Task G', CAST(N'2018-10-25' AS Date), CAST(N'2018-11-01' AS Date))
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (8, NULL, NULL, NULL, N'Y', N'GROUP 1', NULL, NULL)
INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (9, NULL, NULL, NULL, N'Y', N'GROUP 2', NULL, NULL)

Open in new window

USE [WSAProject]
GO
/****** Object:  StoredProcedure [dbo].[wsa_ProjectTask_s]    Script Date: 10/12/2018 3:20:35 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO



WITH Task_CTE AS (

	SELECT  
	CAST(right('000' + cast(row_number()over(partition by Task.Task_Parent_ID order by Task.Order_By, Task.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID IS NULL Or lt.Task_Parent_ID = 0)

UNION ALL

	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by t.Task_Parent_ID order by t.Order_By, t.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
) 

Open in new window

Linked_Task.sql
Linked_Task_Query.sql
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
OK, while Paul is quite correct about it being a regular join given the hierarchical relationship, Bit more challenging if quite a few levels deep.... So, let's address your concerns about recursive cte...

Your query was very nearly finished. Just a few tweaks, an unknown column name, and it worked.

;WITH Task_CTE AS 
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID IS NULL Or lt.Task_Parent_ID = 0)

    UNION ALL

	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
) select * 
  from task_cte
  order by sort
--option (maxrecursion 0)

Open in new window

So your basic approach in making a recursive CTE was good. The common mistake is getting the root node correct. And you had that down pact.

However, Recursive CTE's can be very inefficient and resource hungry. They can end up looping the entire collection in the CTE every time it appends another row to the collection.

So, if ever possible, if there is another way, then you are better off avoiding recursive CTE.

Here are 3 queries to compare the costs using the same basic approach of the same CTE (well, starts the same):
;WITH Task_CTE AS  -- Query 1 
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID IS NULL Or lt.Task_Parent_ID = 0)

    UNION ALL

	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
) select * 
  from task_cte
  order by sort
--option (maxrecursion 0)

--Query 1: Query cost (relative to the batch): 57%


;WITH Task_CTE AS   -- Query 2
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID in (select distinct task_parent_id from Linked_Task))
)
	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
    order by 1

--Query 2: Query cost (relative to the batch): 18%


;WITH Task_CTE AS    -- Query 3
(
	SELECT  
	CAST(right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*lt.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	1 as TaskLevel,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
	From Linked_Task lt
	WHERE  (lt.Task_Parent_ID in (select distinct task_parent_id from Linked_Task))
)   select * from Task_CTE as t_cte union
	SELECT 
	CAST(Sort +  + '/' + right('000' + cast(row_number()over(partition by lt.Task_Parent_ID order by /*t.Order_By,*/ lt.Due_Date) as varchar(250)),3) as varchar(250)) 
	as Sort, 
	t_cte.TaskLevel + 1,
	lt.Task_ID,
	lt.Task_Parent_ID,
	lt.Task_Name,
	lt.Linked_Task_ID,
	lt.Linked_Task_Type,
	lt.Is_Group,
	lt.[Start_Date],
	lt.Due_Date
    FROM Linked_Task lt
	INNER JOIN Task_CTE as t_cte ON lt.Task_Parent_ID = t_cte.Task_ID
    order by 1

--Query 3: Query cost (relative to the batch): 25%

Open in new window

As you can see, Query 1 was more (cost) than the other two combined.

Query 2, you would have to add those 'root' nodes as addional columns. Much the same way Paul did above

Query 3, simply does a union to get the 'root' node appearing in the same columns...

And, there is very little difference in how the three CTE's were created - main difference is where you end the initial query.

Of course, if you do have more than a few levels deep, then the recursive CTE (albeit expensive) is a pretty good way of traversing the complete hierarchy.

Query 1 will still work with the addition of :

INSERT [dbo].[Linked_Task] ([Task_ID], [Task_Parent_ID], [Linked_Task_ID], [Linked_Task_Type], [Is_Group], [Task_Name], [Start_Date], [Due_Date]) VALUES (13, 3, 2, N'FS', N'N', N'Task C 13', CAST(N'2018-10-16' AS Date), CAST(N'2018-10-19' AS Date))

Where as Query 2 and 3 will probably fail to fully report all levels correctly, and/or, in the correct and logical sequence.
Avatar of tablaFreak

ASKER

Thanks gentlemen, this was most helpful!
Best,
Steve
Our pleasure :)
Indeed, a pleasure. Cheers, Paul.