Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Recursive file and folder audit... T-SQL

Posted on 2014-11-29
Last Modified: 2014-12-09
Starting with the following files in drive C:

             c.xls  <-- this file is in Folder_1

Note: There could be many levels of subfolders. I am trying to read file names and folder names to enter them in a table like:

ID_PK           Name               ID_FK            FileYN
1                C:                      null
     2          Folder_1                  1    <-- meaning it is in C:
     3          c.xls                         2                    Yes
     4          FolderA                   2
     5          a.xls                        4                    Yes
     6          b.xls                        4                    Yes
     7         Folder_2                  1
     8          d.xls                        7                    Yes
     9          e.xls                        7                    Yes

Question: Could this be done recursively to include all .xls files, folders, and subfolders (regardless it has .xls or not)?

Order of entries is not important. I will, at some point, expand this to include .doc, .png, etc.

There is an undocumented PROC which does something similar. I wonder if it could be modified to list select files and folders, .xls for example which we can pass as a parameter, in this table as shown above. This proc is discussed at: http://www.kodyaz.com/articles/file-folder-list-xp_dirtree-sql-server-stored-procedure-recursive-cte.aspx
Question by:Mike Eghtebas
  • 2
  • 2
  • 2
  • +2
LVL 24

Expert Comment

by:Phillip Burton
ID: 40472433
Yes, you can.

You can use a recursive CTE to implement this.

However, I'm not quite clear - are you saying that the second example is what you have, and the first is what you want?

Now I'm looking at your link, the last example does use a recursive CTE to solve this.

The only thing which is missing is the spacing, and REPLICATE('. ', depth-1) would achieve that.
LVL 77

Expert Comment

ID: 40472515
Must it b e tsql, running from sql, you run into the possibility that the credentials with which mssql runs does not have rights on the path. using vbscript, powershell, c# might be simpler to navigate through the filesystem while building/updating the data in the sql.
LVL 34

Author Comment

by:Mike Eghtebas
ID: 40472533
Must it b e tsql

run into ... credentials
Personal computer, windows authenticated.

using vbscript, powershell
do not know much about either.

c# might be simpler
c# solution will be much better to get started.
Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

LVL 77

Assisted Solution

arnold earned 150 total points
ID: 40472704
MSDN has the tutorial on enumerating drives in a system, directories within a drive, files within a directory.

Includes references to how to read input data from a file.

C# interaction with a ms sql database is also covered within MSDN
USing the System.Data.SqlClient;
Double check which .Net framework is installed on your (you could test for the installed, and adjust accordingly)
the locating of drives in a system, traversing and locating directories, files is part of the attributes. When you find a file, you can then check its extension ....
You could use recurisve function that will initially receive the Drive letter, then will enumerate the contents, when a directory is located, it will call itself with the new path

Process_recursice (path, ID)
i.e. if you have three partition C (1), D (2), E (3), and you start (0)
Fk suggests that you have multiple tables, i.e. drive table, directory table, and file table?
directory table will have a FK from the drive table and a one to many relationship to the file table.
You would want to break them up such that you can query the data using a join versus running a sequential query on the same table. once a file is located, to locate the directories above it, and then the drive
file a is in directory A which is in directory 1, which is within directory,
The directory table will be queried multiple times to collect the chain.

c: (1)
   folder 1
              folder I
                          folder i

You should consider what it is you are using it for in an attempt to optimize the retrieval of data.
LVL 16

Expert Comment

ID: 40473036
Just a thought: you can use the command line dir command to generate a file recursively through the file system, and parse that...
LVL 34

Author Comment

by:Mike Eghtebas
ID: 40473041

Do you have a sample?
LVL 69

Accepted Solution

Scott Pletcher earned 250 total points
ID: 40474100
You can use MS's undocumented stored proc:

It has three parameters:
1.  directory name
       -- such as 'C:\'
2.  depth
        --how many subfolder levels below the starting directory to display.  0 = list all subfolders
3.  flag to list only folders or (files and folders)
        --0=list only folders;1=list files and folders.

So, for your request:

IF OBJECT_ID('tempdb.dbo.#xp_dirtree') IS NOT NULL
    DROP TABLE dbo.#xp_dirtree
CREATE TABLE #xp_dirtree (
    ident int identity(1, 1) not null,
    subdirectory nvarchar(255) null,
    depth smallint null,
    is_file bit null

INSERT INTO #xp_dirtree ( subdirectory, depth, is_file )
EXEC master.sys.xp_dirtree 'C:', 0, 1

FROM #xp_dirtree
--WHERE is_file = 1 --list only files, not directories
--ORDER BY ...
LVL 16

Assisted Solution

DcpKing earned 100 total points
ID: 40475111
As an example of using the dir command, open a command prompt and you'll be at something like c:\users\eghtebas.
Enter "CD .." (without the quotes!) and press enter. Do this again and you should be at the root, C:\.
On my system I then entered
dir /?
to get some help. Once I had determined what I wanted I entered the following command (/S means "recurse through all subdirectories"; /B means "use bare format", so it skips all introduction and summary lines)

C:\>dir /S /B *.xls > results.txt

Open in new window

Then, when it was finished, I opened the new file called results.txt and found this:
C:\SSISMaster\Class Instructor Demo Schedule.xlsx
C:\SSISMaster\Data Files\Excel Example.xlsx
C:\SSISMaster\Data Files\USCustomers.xls
C:\Users\mai\Desktop\SSISMaster\Class Instructor Demo Schedule.xlsx
C:\Users\mai\Desktop\SSISMaster\Data Files\Excel Example.xlsx
C:\Users\mai\Desktop\SSISMaster\Data Files\USCustomers.xls
C:\Users\mai\Documents\Visual Studio 2010\Projects\2007Fint.xls
C:\Users\mai\Dropbox\Tola\patterns\Scarves and Shawls\PIShawl.xls

Open in new window

I can read these in to a SQL Server staging table very easily and then create whatever I want for data from them.
So, how to get this done.

You're using SQL Server 2005 or greater, so create a batch file to do your work. The file might read:
[embed=file 885772]del results.txt
dir /S /B c:\*.xls > results.txt

Open in new window

Then go to SSIS (it comes with your installation) and create a new project. Use an ExecuteProcess Task to run the batch file and then a flat file import to get the contents of the resulting file into the records of a table (see attached screengrab).



Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question