Cleanup Premature Application GoLive

I need to clean up the results of a premature move to production of a small application I built.  Due to a few minor logical errors and an overly aggressive timeline, the errors didn't become apparent until the application was put fully into production, and so on.
The application parses medical transcription files for insertion of records into a SQL Server DB and storage on a File Server.  
I need to accomplish the following tasks.
1.  A number of records have been inserted into the database without the corresponding files being moved into the storage directory.  When accessed in the final product, they appear normal until loaded and then error out.  The files are stored and referenced as GUID.doc or GUID.rtf.  
I need to be able to search the file server and return the database records that are orphaned from files.
2.  A number of files have been moved into the storage directory, without the requisite records being inserted into the DB, or with erroneous records inserted.
I have cleaned up the erroneous records, and now need to address the orphaned files in the storage directory.  The files need to be archived to a backup directory for possible manual review.
With any luck in the end the DB and Storage will be in sync and all will be happy.
I am proficient in SQL inside of the DB but have never addressed files at the storage level via SQL and have next to no idea how to proceed.

Any and all help is appreciated.  Thanks in advance for the energy expended.

I have not read the full Posting Rules, and presume it is inappropriate to offer payment for services.  On the chance I am wrong; I have no issues with paying someone for their time.
Mark HolbertAsked:
Who is Participating?
Kevin CrossConnect With a Mentor Chief Technology OfficerCommented:
No problem. I found it interesting. It reminded me also to keep it simple. As a programmer, when I saw your question my mind immediately jumped to VB code to script the FileSystemObject and recursively get file names and have to further mess with IO to write out a file. It was nice to think of different approaches that did not involve much programming outside of T-SQL code.

All the code is tested by the way, but do keep in mind I have full permissions, though I gather you have full dbo access.

Anyway, good luck to you! Just let me know how it turns out.

Best regards and happy coding,

Kevin CrossChief Technology OfficerCommented:

What programming (scripting) languages are you comfortable with? First thought is to use a quick script of the file system to generate a list of ALL files with .doc and .rtf extension that are in the specified directory. Whether or not this is recursive, i.e., you have subdirectories, will be good to note. Additionally, you will want to check if there are any files that do not belong in the directory or don't meet the standard format.

If all is well, you can have the output of the above be a CSV or XLS file that has one column of filenames.

This can be imported into SQL (or using OPENROWSET for example), then you can do a standard SQL INSERT or DELETE based on whether filename EXISTS in table or not.
Mark HolbertAuthor Commented:
I'm not particularly versed in any of the scripting languages, but am most familiar with command line stuff and .bat files.

I can collect the records at the database level but don't script well enough to collect the storage structure information, and also don't have experience with sql interacting with external stuff.

if i can get the list of files in the storage structure into a table i should be able to take it from there mostly.  the only interaction with the storage structure beyond that would be the final MOVE statement to move the orphaned files to the archive directory.  

Hope this helps.  MH
Simplify Active Directory Administration

Administration of Active Directory does not have to be hard.  Too often what should be a simple task is made more difficult than it needs to be.The solution?  Hyena from SystemTools Software.  With ease-of-use as well as powerful importing and bulk updating capabilities.

Kevin CrossChief Technology OfficerCommented:
Okay, here is a simple example using DIR command line.

Get files of a specific extension in a directory and all its subdirectories and put output in a new file.
dir c:\temp\*.doc /s/b > c:\temp\docfiles.csv

Open in new window

Example contents of CSV:

Open in new window

Pinal Dave, SQL Server MVP, has a blog entry on using BULK import CSV:
(I like it because it shows Excel import using OPENROWSET also if you scroll through comments)
CREATE TABLE #FileNames ([FileName] VARCHAR(255));

INSERT #FileNames
FROM 'c:\temp\docfiles.csv'

                     CHARINDEX('\', REVERSE([FileName]))-1), 
               '.doc', '') AS [GUIDS]
FROM #FileNames;

DROP TABLE #FileNames;

Open in new window

And really, you do not even have to script that. You can use DIR to create both a docfiles.csv and a rtffiles.csv and then right-click on your database and select Tasks > Import Data... and follow the SQL Server Import and Export Wizard for Flat File Source.

The T-SQL I showed above gives you an example of how you can strip the file path and document extension to get just the GUID, although you may not need to if you do your comparison with database as [FileName] LIKE '%\' + CONVERT(CHAR(36), [YourGUIDColumn]) + '.%'. You should be good from there other than exporting back to a CSV.

The SQL command line utilities such as bcp may come in handy for that part. Once you have your list of files imported. You can run a query that checks for files that exist on the file system and not in the database. You can use the tools from SQL to get this output into a CSV.

Using this CSV, you can probably use VBScript or PowerShell to iterate over the CSV contents and move files accordingly. I have not tried or at least do not remember doing this with a DOS batch file, but it probably has some capabilities to do this also.

Hope that helps!

Kevin CrossChief Technology OfficerCommented:
You may find this interesting also. If you are going to use the bcp and other utilities from T-SQL, you have to enable xp_cmdshell

With xp_cmdshell enabled, you can actually do this in one procedure.
EXEC sp_configure 'show advanced options', 1;
EXEC sp_configure 'xp_cmdshell', 1;

-- regular #temp_tables do not work; needs a physical table/view
-- ##global_temp_tables can be used or TempDB..TempTable
CREATE TABLE ##FileNames ([FileName] VARCHAR(255));

DECLARE @dircmd VARCHAR(100);

-- use same DIR command to get list of files
SET @dircmd = 'dir c:\temp\*.doc /s/b';
INSERT INTO ##FileNames 
EXEC xp_cmdshell @dircmd;

SET @dircmd = 'dir c:\temp\*.rtf /s/b';
INSERT INTO ##FileNames 
EXEC xp_cmdshell @dircmd;

-- do your processing: simple select shown here
SELECT [FileName]
FROM ##FileNames 
-- if dir does not find any files, result is 'File Not Found'
WHERE [FileName] <> 'File Not Found'; 

-- export some results, using bcp
EXEC xp_cmdshell 'bcp "SELECT [FileName] FROM ##FileNames WHERE [FileName] <> ''File Not Found'';" queryout c:\temp\files2arch.csv -T -c';

DROP TABLE ##FileNames;

-- disable for security reasons when done (optional)
EXEC sp_configure 'show advanced options', 1;
EXEC sp_configure 'xp_cmdshell', 0;

Open in new window

Mark HolbertAuthor Commented:
Awesome, i wil play through it and let you know.  I very much appreciate the help.  
Mark HolbertAuthor Commented:
i am cleaning everything up and going on the thought that i can at some point write the files that need deleting at the folder level into a file and script the deletion.

I ended up importing the data via the wizard and i guess i can do the same on the way out.

i appreciate the help.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.