<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

A Custom Recursive LINQ Extension

Published on
16,691 Points
10,491 Views
2 Endorsements
Last Modified:
Approved
What does this article hope to achieve?

   I’ve been writing software for quite a long time now and I try to identify any repetitive task and determine if it is code that I can isolate and write once and reuse later. Recursion is a pattern that surfaces semi-frequently and always requires a moment of thought and likely a few iterations through the debugger to ensure whatever tree-like structure I’m dealing with is being traversed properly.

  Recursion has always sat in the back of my mind as something that should fall into a reusable pattern library that I can carry in my toolbox but I’ve just never gotten around to it largely because I felt each implementation was significantly unique enough not to lend itself well to a generic pattern. I was flat wrong.

   I recently read about writing a simple custom LINQ extension on Sascha Barber’s website in this article. Although this wasn’t a recent article it inspired me to conquer my long standing procrastination concerning a generic recursion pattern. The benefit of a LINQ enabled custom extension that provided generic recursion was obvious. One could chain in the LINQ recursion extension anywhere one was writing a LINQ statement or as a stand-alone LINQ statement itself.

   I wanted to write something that would recurse any structure that lent itself to recursion. I came up with the following common test scenarios in my mind that would enable me to prove the extension as sufficiently generic.

•      XML
•      TreeView nodes on a windows form
•      Windows File System

   Obviously, XML content can be tree like in nature. XML is a group of ever nesting elements that can be easily traversed using recursion. In a name, a TreeView says it all. A TreeView can contain TreeNode visual objects that in turn contain TreeNode child objects to the nth level. The Windows File System should require no explaining. We’re all familiar that directories can contain other directories and/or files and so forth.

   So, as a proof of concept, I want my Recurse LINQ extension method to handle all of these scenarios hopefully in an elegant manner but certainly as code that I know “just works” that I don’t have to write from scratch every time a situation for recursion presents itself.

Without any further ado let’s take a look at what I came up with as far as the Recurse LINQ extension and explain the code in further detail.

using System;
using System.Collections.Generic;
using System.Linq;

namespace Linq.Extensions
{
	public static class LinqExtensions
	{
		public static IEnumerable<T> Recurse<T> 
			(
				this T root, 
				Func<T, IEnumerable<T>> findChildren
			) 
			where T : class
		{
			yield return root;

			foreach (var child in 
				findChildren(root)
					.SelectMany(node => Recurse(node, findChildren))
					.TakeWhile(child => child != null))
			{
				yield return child;
			}

			// SAME CODE AS ABOVE IN A MORE VERBOSE FASHION
			//foreach (var node in findChildren(root))
			//{
			//    foreach (var child in Recurse(node, findChildren))
			//    {
			//        if (child == null)
			//            yield break;
					
			//        yield return child;
			//    }
			//}
		}
	}
}

Open in new window


  I wanted a least common denominator approach to allow for the greatest flexibility in the Recurse method. The bare minimum information we need to provide a generic recursion method is: a starting point of recursion, or the “root”, as well as how each item determines its child objects. While this methodology allows for the greatest flexibility it requires a little forethought into what exactly you want to recurse and how exactly to fetch the children of what you’re recursing.

What makes this method a LINQ extension?

   Paraphrasing Sasha’s article, all we need to conform to a LINQ extension is to provide…

1) An IEnumerable<T> return
2) Conform to any extension method signature which takes a first parameter of “this”. In our case we will be using the T generic type of our return. The “this” parameter is what denotes this method as an “extension method”. There are countless well written articles concerning the writing of extension methods that you can peruse should you be unfamiliar with the concept.
3) Be a static method in a static class. The latter is not required it’s just a good practice of organization. I plan on adding new LINQ extensions in future articles so for now we just have the single static class and method.

That’s all we have to do for our method to be available to us as a LINQ extension which resolves the implied problem in the first part of the article title. If you’re interested in writing an extension that does something other than recursion just follow those basic guidelines and away you go.

   I do recommend implementing the “yield return” methodology when writing methods that return an IEnumerable<T> in C# code where possible. There are many articles out there describing the inherent benefits of yield return that you should check out if this approach isn’t clear to you. In fact, in the case of recursion, it’s imperative as I’ll explain in more detail later.

What makes this method implement generic recursion?

   The recursive nature of the method call lies in the Recurse method calling itself, passing on each child found in the provided “findChildren” method as the new root for recursion. This is the beauty and elegance found in the nature of recursion. We keep calling our method, nesting inside of ourselves until we discover and return all children to the nth level. This should be immediately obvious to anyone whom has written any type of recursive function in their programming career.

   A common problem with recursion is that you can build a huge stack on the memory heap as you recurse further and further into the structure tree. This problem is neatly avoided with the C# yield return methodology. The very nature of the yield return keeps us from pushing an unknown amount of data onto the memory stack but rather take a single item, as needed, while we recursively traverse our generic tree structure.

   That being said, I don’t recommend porting this to VB.Net. While possible, the aforementioned problem cannot be averted to my knowledge but I’m certainly open to a VB.Net solution that could handle this situation effectively.

   Please note the commented code which is provided for an easier read. The implemented and commented code is semantically the same and the latter is only provided for clarity. Referencing the code that is commented out we can see this is a very simple method indeed. We simply iterate each child of our root object by calling the provided findChildren method. Our findChildren method takes in a parameter of type T (our generic type) and provides an IEnumerable<T> return. See the pattern here? To effectively recurse our generic tree we have to be able to plug back in results that match the results of our Recurse method itself. On each iteration of the root node we Recurse our child node itself providing the exact same method to find children.

   Please note I could take 1000 images of debugging steps to try and clarify what will make immediate sense to the reader actually stepping though the code in debug mode themselves. If the nature of the above code does not make immediate sense please take the time to step through one of the unit tests at least once to see how things are working not only with the code but with the nature of the C# compiler and the yield return statements themselves. One thing that will jump out at you is when the code is actually called which is when you start to actually iterate the results of the Recurse method. Try stepping into the Recurse method itself and you will see what I mean. The compiler delves into the Recurse method when the first result is actually needed (as in the case of the first element in a foreach loop being required).

Testing our generic recursive method

   I have included an example unit test written to test our three scenarios in three simple methods. Let’s take a look at the two easiest scenarios first.

The TestXmlRecursion test method loads an XmlDocument into memory with sample XML content I’ve saved into the resource of the unit test project itself. Nothing special going on there we just have some very basic XML as follows...

<Root>
	<ChildA>
		<SubChildA></SubChildA>
		<SubChildB></SubChildB>
		<SubChildC></SubChildC>
	</ChildA>
	<ChildB>
		<SubChildA></SubChildA>
		<SubChildB></SubChildB>
	</ChildB>
	<ChildC>
		<SubChildA></SubChildA>
	</ChildC>
	<ChildD>
	</ChildD>
</Root>

Open in new window


   Ideally we want to iterate the XML tree as intuitively as possible and output the XML almost as you read it. Without implementing indentation that's exactly what our output will look like as I've written the test.

   Our unit test method and supporting methods are as follows...

[TestMethod]
public void TestXmlRecursion()
{
	var xmlDocument = new XmlDocument();
	xmlDocument.LoadXml(Resources.SampleXmlData);

	foreach (var node in xmlDocument.FirstChild.Recurse(EnumerateXmlElements))
	{
		WriteXmlNode(node);
	}
}

private static void WriteXmlNode(XmlNode node)
{
	Debug.WriteLine(node.Name);
}

public IEnumerable<XmlNode> EnumerateXmlElements(XmlNode node)
{
	return node.ChildNodes.Cast<XmlNode>();
}

Open in new window


   Our code couldn't be more simple. After we do the footwork of loading up the XmlDocument we grab the first child and call our LINQ Recurse method providing our custom-written EnumerateXmlElements method to provide our Recurse() LINQ  method with the children it needs for proper processing.

   Our EnumerateXmlElements method is very simple just passing back the ChildNodes property of the XmlNode provided. Please note the Cast<XmlNode> call which is a great way to provide a proper IEnumerable<T> of any framework collections that are implemented as a special collection such as an XmlNodeCollection or TreeNodeCollection, etc. I use this all over the place to facilitate LINQ queries built on top of framework or custom collections where appropriate.

   For our simple example we take in an XmlNode object and write out the node.Name property which is the text value of the XML element itself. Although I don't go into attributes or 1000 other XML specific examples here you can easily see that once we have a method taking in an XmlNode parameter returned from our recursive call we have a lot of versatility here on what operations we can perform with our XML elements inside the XmlDocument.

  So what does our ouput look like? Exactly, I think, as our intuitive minds would imagine, as a list of XML node element tags with no indentation...

Root
ChildA
SubChildA
SubChildB
SubChildC
ChildB
SubChildA
SubChildB
ChildC
SubChildA
ChildD

   for a bit of fun and to demonstrate proper LINQ extensibility I've included a reversed XML method which simply calls the .Reverse() LINQ extension after our .Recurse() call and, as expected, this is our result...

ChildD
SubChildA
ChildC
SubChildB
SubChildA
ChildB
SubChildC
SubChildB
SubChildA
ChildA
Root

  For brevity, I'm omitting a full explanation of the TreeView unit test methodology because it is almost identical in nature to our XML example. One glance at the code provided should prove that evident.

  The file system example is a little different not so much in the call to our LINQ Recurse example but more in the area of "where to start" and "what to provide". Let's take a look...

[TestMethod]
public void TestFileSystemRecursion()
{
	const string rootPath = @"C:\inetpub\";

	foreach (var path in rootPath.Recurse(EnumerateFileSystemEntries))
	{
		WritePath(path);
	}
}

public IEnumerable<string> EnumerateFileSystemEntries(string root)
{
	if (Directory.Exists(root))
	{
		foreach (var fileSystemEntry in Directory.EnumerateFileSystemEntries(root))
		{
			yield return fileSystemEntry;
		}
	}

	yield return null;
}

public static void WritePath(string path)
{
	Debug.WriteLine(path);
}

Open in new window


  So what's different here? Well, here we're taking a string path and enumerating the string via our EnumerateFileSystemEntries method which returns a list of strings. Huh? Really what we're recursing is a string tree in essence which just happens to be a nested list of paths. In our WritePath method we simply output the path of where we are in the file system tree.

   You may be thinking something's fishy about this approach and you'd be right. In fact there's nothing wrong with this approach as long as all we're interested in is the path of the folder or file down the tree. What if we wanted to output the creation time of each item though? We would effectively have to read the FileSystemInfo object twice, once inside our recursion and again inside our WritePath method to retrieve a FileSystemInfo object based on the fetched path. This obviously isn't the most streamlined approach to solve our problem in this case. I have included a TestFileSystemRecursionImprovedPerformance unit test method to simulate a work-around for this issue and how to best use the Recurse() extension to our advantage. I'm leaving this out of the article explanation, again for brevity, but it's a good example of how you can do things different ways with such a generic solution.

Included below are the the full code samples you can copy and paste into your own solutions.

Custom LINQ Extension Recurse itself
using System;
using System.Collections.Generic;
using System.Linq;

namespace Linq.Extensions
{
	public static class LinqExtensions
	{
		public static IEnumerable<T> Recurse<T> 
			(
				this T root, 
				Func<T, IEnumerable<T>> findChildren
			) 
			where T : class
		{
			yield return root;

			foreach (var child in 
				findChildren(root)
					.SelectMany(node => Recurse(node, findChildren))
					.TakeWhile(child => child != null))
			{
				yield return child;
			}

			// SAME CODE AS ABOVE IN A MORE VERBOSE FASHION
			//foreach (var node in findChildren(root))
			//{
			//    foreach (var child in Recurse(node, findChildren))
			//    {
			//        if (child == null)
			//            yield break;
					
			//        yield return child;
			//    }
			//}
		}
	}
}

Open in new window


Unit Test Class
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using System.Xml;
using Linq.Extensions.UnitTests.Properties;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace Linq.Extensions.UnitTests
{
	[TestClass]
	public class RecurseUnitTest
	{
		[TestMethod]
		public void TestFileSystemRecursion()
		{
			const string rootPath = @"C:\inetpub\";

			foreach (var path in rootPath.Recurse(EnumerateFileSystemEntries))
			{
				WritePath(path);
			}
		}

		[TestMethod]
		public void TestFileSystemRecursionImprovedPerformance()
		{
			FileSystemInfo info = new DirectoryInfo(@"C:\inetpub\");

			foreach (var path in info.Recurse(EnumerateFileSystemEntriesImprovedPerformance))
			{
				WritePath(path);
			}
		}

		[TestMethod]
		public void TestXmlRecursion()
		{
			var xmlDocument = new XmlDocument();
			xmlDocument.LoadXml(Resources.SampleXmlData);

			foreach (var node in xmlDocument.FirstChild.Recurse(EnumerateXmlElements))
			{
				WriteXmlNode(node);
			}
		}

		[TestMethod]
		public void TestXmlRecursionReverse()
		{
			var xmlDocument = new XmlDocument();
			xmlDocument.LoadXml(Resources.SampleXmlData);

			foreach (var node in xmlDocument.FirstChild.Recurse(EnumerateXmlElements).Reverse())
			{
				WriteXmlNode(node);
			}
		}

		[TestMethod]
		public void TestWindowsFormsTreeRecursion()
		{
			var form = new FormTreeView();

			foreach (var node in form.treeView.Nodes[0].Recurse(EnumerateTreeViewNodes))
			{
				WriteTreeViewNode(node);
			}
		}

		private static void WriteTreeViewNode(TreeNode node)
		{
			Debug.WriteLine(node.Text);
		}

		private static void WriteXmlNode(XmlNode node)
		{
			Debug.WriteLine(node.Name);
		}

		public IEnumerable<TreeNode> EnumerateTreeViewNodes(TreeNode node)
		{
			return node.Nodes.Cast<TreeNode>();
		}

		public IEnumerable<XmlNode> EnumerateXmlElements(XmlNode node)
		{
			return node.ChildNodes.Cast<XmlNode>();
		}

		public IEnumerable<string> EnumerateFileSystemEntries(string root)
		{
			if (Directory.Exists(root))
			{
				foreach (var fileSystemEntry in Directory.EnumerateFileSystemEntries(root))
				{
					yield return fileSystemEntry;
				}
			}
		}

		public IEnumerable<FileSystemInfo> EnumerateFileSystemEntriesImprovedPerformance(FileSystemInfo info)
		{
			var rootDirectoryInfo = info as DirectoryInfo;

			if (rootDirectoryInfo == null) yield break;

			foreach (var directoryInfo in rootDirectoryInfo.GetDirectories())
			{
				yield return directoryInfo;
			}

			foreach (var fileInfo in rootDirectoryInfo.GetFiles())
			{
				yield return fileInfo;
			}
		}

		public static void WritePath(string path)
		{
			Debug.WriteLine(path);
		}
		
		public static void WritePath(FileSystemInfo info)
		{
			Debug.WriteLine(info.FullName);
		}
		
	}
}

Open in new window


Sample XML
Just attach as a file resource inside your Unit Test project named "SampleXmlData"
<Root>
	<ChildA>
		<SubChildA></SubChildA>
		<SubChildB></SubChildB>
		<SubChildC></SubChildC>
	</ChildA>
	<ChildB>
		<SubChildA></SubChildA>
		<SubChildB></SubChildB>
	</ChildB>
	<ChildC>
		<SubChildA></SubChildA>
	</ChildC>
	<ChildD>
	</ChildD>
</Root>

Open in new window


Sample Form with a populated TreeView control
namespace Linq.Extensions.UnitTests
{
	partial class FormTreeView
	{
		/// <summary>
		/// Required designer variable.
		/// </summary>
		private System.ComponentModel.IContainer components = null;

		/// <summary>
		/// Clean up any resources being used.
		/// </summary>
		/// <param name="disposing">true if managed resources should be disposed; otherwise, false.</param>
		protected override void Dispose(bool disposing)
		{
			if (disposing && (components != null))
			{
				components.Dispose();
			}
			base.Dispose(disposing);
		}

		#region Windows Form Designer generated code

		/// <summary>
		/// Required method for Designer support - do not modify
		/// the contents of this method with the code editor.
		/// </summary>
		private void InitializeComponent()
		{
			System.Windows.Forms.TreeNode treeNode1 = new System.Windows.Forms.TreeNode("SubChildA");
			System.Windows.Forms.TreeNode treeNode2 = new System.Windows.Forms.TreeNode("SubChildB");
			System.Windows.Forms.TreeNode treeNode3 = new System.Windows.Forms.TreeNode("SubChildC");
			System.Windows.Forms.TreeNode treeNode4 = new System.Windows.Forms.TreeNode("ChildA", new System.Windows.Forms.TreeNode[] {
            treeNode1,
            treeNode2,
            treeNode3});
			System.Windows.Forms.TreeNode treeNode5 = new System.Windows.Forms.TreeNode("SubChildA");
			System.Windows.Forms.TreeNode treeNode6 = new System.Windows.Forms.TreeNode("SubChildB");
			System.Windows.Forms.TreeNode treeNode7 = new System.Windows.Forms.TreeNode("ChildB", new System.Windows.Forms.TreeNode[] {
            treeNode5,
            treeNode6});
			System.Windows.Forms.TreeNode treeNode8 = new System.Windows.Forms.TreeNode("SubChildA");
			System.Windows.Forms.TreeNode treeNode9 = new System.Windows.Forms.TreeNode("ChildC", new System.Windows.Forms.TreeNode[] {
            treeNode8});
			System.Windows.Forms.TreeNode treeNode10 = new System.Windows.Forms.TreeNode("ChildD");
			System.Windows.Forms.TreeNode treeNode11 = new System.Windows.Forms.TreeNode("RootA", new System.Windows.Forms.TreeNode[] {
            treeNode4,
            treeNode7,
            treeNode9,
            treeNode10});
			System.Windows.Forms.TreeNode treeNode12 = new System.Windows.Forms.TreeNode("RootB");
			this.treeView = new System.Windows.Forms.TreeView();
			this.SuspendLayout();
			// 
			// treeView
			// 
			this.treeView.Dock = System.Windows.Forms.DockStyle.Fill;
			this.treeView.Location = new System.Drawing.Point(0, 0);
			this.treeView.Name = "treeView";
			treeNode1.Name = "Node7";
			treeNode1.Text = "SubChildA";
			treeNode2.Name = "Node8";
			treeNode2.Text = "SubChildB";
			treeNode3.Name = "Node9";
			treeNode3.Text = "SubChildC";
			treeNode4.Name = "Node3";
			treeNode4.Text = "ChildA";
			treeNode5.Name = "Node10";
			treeNode5.Text = "SubChildA";
			treeNode6.Name = "Node11";
			treeNode6.Text = "SubChildB";
			treeNode7.Name = "Node4";
			treeNode7.Text = "ChildB";
			treeNode8.Name = "Node12";
			treeNode8.Text = "SubChildA";
			treeNode9.Name = "Node5";
			treeNode9.Text = "ChildC";
			treeNode10.Name = "Node6";
			treeNode10.Text = "ChildD";
			treeNode11.Name = "Node0";
			treeNode11.Text = "RootA";
			treeNode12.Name = "Node2";
			treeNode12.Text = "RootB";
			this.treeView.Nodes.AddRange(new System.Windows.Forms.TreeNode[] {
            treeNode11,
            treeNode12});
			this.treeView.Size = new System.Drawing.Size(284, 262);
			this.treeView.TabIndex = 0;
			// 
			// FormTreeView
			// 
			this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);
			this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
			this.ClientSize = new System.Drawing.Size(284, 262);
			this.Controls.Add(this.treeView);
			this.Name = "FormTreeView";
			this.Text = "FormTreeView";
			this.ResumeLayout(false);

		}

		#endregion

		public System.Windows.Forms.TreeView treeView;

	}
}

Open in new window

LinqRecurseExtension.zip
LinqRecurseExtension2008.zip
2
0 Comments

Featured Post

Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

A query can call a function, and a function can call Excel, even though we are in Access. This is Part 2, and steps you through the VBA that "wraps" Excel functionality so we can use its worksheet functions in Access. The declaration statement de…
I've published three five-minute Experts Exchange video Micro Tutorials that describe terrific features in an excellent, free PDF product called PDF-XChange Editor: How to rotate pages in a PDF with free software (https://www.experts-exchange.com…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month