Cleanup Premature Application GoLive

Posted on 2011-10-16
Medium Priority
Last Modified: 2012-05-12
I need to clean up the results of a premature move to production of a small application I built.  Due to a few minor logical errors and an overly aggressive timeline, the errors didn't become apparent until the application was put fully into production, and so on.
The application parses medical transcription files for insertion of records into a SQL Server DB and storage on a File Server.  
I need to accomplish the following tasks.
1.  A number of records have been inserted into the database without the corresponding files being moved into the storage directory.  When accessed in the final product, they appear normal until loaded and then error out.  The files are stored and referenced as GUID.doc or GUID.rtf.  
I need to be able to search the file server and return the database records that are orphaned from files.
2.  A number of files have been moved into the storage directory, without the requisite records being inserted into the DB, or with erroneous records inserted.
I have cleaned up the erroneous records, and now need to address the orphaned files in the storage directory.  The files need to be archived to a backup directory for possible manual review.
With any luck in the end the DB and Storage will be in sync and all will be happy.
I am proficient in SQL inside of the DB but have never addressed files at the storage level via SQL and have next to no idea how to proceed.

Any and all help is appreciated.  Thanks in advance for the energy expended.

I have not read the full Posting Rules, and presume it is inappropriate to offer payment for services.  On the chance I am wrong; I have no issues with paying someone for their time.
Question by:mholbert
  • 4
  • 3
LVL 60

Expert Comment

by:Kevin Cross
ID: 36976142

What programming (scripting) languages are you comfortable with? First thought is to use a quick script of the file system to generate a list of ALL files with .doc and .rtf extension that are in the specified directory. Whether or not this is recursive, i.e., you have subdirectories, will be good to note. Additionally, you will want to check if there are any files that do not belong in the directory or don't meet the standard format.

If all is well, you can have the output of the above be a CSV or XLS file that has one column of filenames.

This can be imported into SQL (or using OPENROWSET for example), then you can do a standard SQL INSERT or DELETE based on whether filename EXISTS in table or not.

Author Comment

ID: 36976190
I'm not particularly versed in any of the scripting languages, but am most familiar with command line stuff and .bat files.

I can collect the records at the database level but don't script well enough to collect the storage structure information, and also don't have experience with sql interacting with external stuff.

if i can get the list of files in the storage structure into a table i should be able to take it from there mostly.  the only interaction with the storage structure beyond that would be the final MOVE statement to move the orphaned files to the archive directory.  

Hope this helps.  MH
LVL 60

Expert Comment

by:Kevin Cross
ID: 36976281
Okay, here is a simple example using DIR command line.

Get files of a specific extension in a directory and all its subdirectories and put output in a new file.
dir c:\temp\*.doc /s/b > c:\temp\docfiles.csv

Open in new window

Example contents of CSV:

Open in new window

Pinal Dave, SQL Server MVP, has a blog entry on using BULK import CSV:
(I like it because it shows Excel import using OPENROWSET also if you scroll through comments)
CREATE TABLE #FileNames ([FileName] VARCHAR(255));

INSERT #FileNames
FROM 'c:\temp\docfiles.csv'

                     CHARINDEX('\', REVERSE([FileName]))-1), 
               '.doc', '') AS [GUIDS]
FROM #FileNames;

DROP TABLE #FileNames;

Open in new window

And really, you do not even have to script that. You can use DIR to create both a docfiles.csv and a rtffiles.csv and then right-click on your database and select Tasks > Import Data... and follow the SQL Server Import and Export Wizard for Flat File Source.

The T-SQL I showed above gives you an example of how you can strip the file path and document extension to get just the GUID, although you may not need to if you do your comparison with database as [FileName] LIKE '%\' + CONVERT(CHAR(36), [YourGUIDColumn]) + '.%'. You should be good from there other than exporting back to a CSV.

The SQL command line utilities such as bcp may come in handy for that part. Once you have your list of files imported. You can run a query that checks for files that exist on the file system and not in the database. You can use the tools from SQL to get this output into a CSV.

Using this CSV, you can probably use VBScript or PowerShell to iterate over the CSV contents and move files accordingly. I have not tried or at least do not remember doing this with a DOS batch file, but it probably has some capabilities to do this also.

Hope that helps!

Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

LVL 60

Expert Comment

by:Kevin Cross
ID: 36976369
You may find this interesting also. If you are going to use the bcp and other utilities from T-SQL, you have to enable xp_cmdshell http://msdn.microsoft.com/en-us/library/ms175046(v=sql.90).aspx

With xp_cmdshell enabled, you can actually do this in one procedure.
EXEC sp_configure 'show advanced options', 1;
EXEC sp_configure 'xp_cmdshell', 1;

-- regular #temp_tables do not work; needs a physical table/view
-- ##global_temp_tables can be used or TempDB..TempTable
CREATE TABLE ##FileNames ([FileName] VARCHAR(255));

DECLARE @dircmd VARCHAR(100);

-- use same DIR command to get list of files
SET @dircmd = 'dir c:\temp\*.doc /s/b';
INSERT INTO ##FileNames 
EXEC xp_cmdshell @dircmd;

SET @dircmd = 'dir c:\temp\*.rtf /s/b';
INSERT INTO ##FileNames 
EXEC xp_cmdshell @dircmd;

-- do your processing: simple select shown here
SELECT [FileName]
FROM ##FileNames 
-- if dir does not find any files, result is 'File Not Found'
WHERE [FileName] <> 'File Not Found'; 

-- export some results, using bcp
EXEC xp_cmdshell 'bcp "SELECT [FileName] FROM ##FileNames WHERE [FileName] <> ''File Not Found'';" queryout c:\temp\files2arch.csv -T -c';

DROP TABLE ##FileNames;

-- disable for security reasons when done (optional)
EXEC sp_configure 'show advanced options', 1;
EXEC sp_configure 'xp_cmdshell', 0;

Open in new window


Author Comment

ID: 36976408
Awesome, i wil play through it and let you know.  I very much appreciate the help.  
LVL 60

Accepted Solution

Kevin Cross earned 2000 total points
ID: 36976438
No problem. I found it interesting. It reminded me also to keep it simple. As a programmer, when I saw your question my mind immediately jumped to VB code to script the FileSystemObject and recursively get file names and have to further mess with IO to write out a file. It was nice to think of different approaches that did not involve much programming outside of T-SQL code.

All the code is tested by the way, but do keep in mind I have full permissions, though I gather you have full dbo access.

Anyway, good luck to you! Just let me know how it turns out.

Best regards and happy coding,


Author Comment

ID: 37001880
i am cleaning everything up and going on the thought that i can at some point write the files that need deleting at the folder level into a file and script the deletion.

I ended up importing the data via the wizard and i guess i can do the same on the way out.

i appreciate the help.


Featured Post

Restore individual SQL databases with ease

Veeam Explorer for Microsoft SQL Server delivers an easy-to-use, wizard-driven interface for restoring your databases from a backup. No expert SQL background required. Web interface provides a complete view of all available SQL databases to simplify the recovery of lost database

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For anyone that has accidentally used newSID with Server 2008 R2 (like I did) and hasn't been able to get the server running again because you were unlucky (as I was) and had no backups - I was able to get things working by doing a Registry Hive rec…
If you are a mobile app developer and especially develop hybrid mobile apps then these 4 mistakes you must avoid for hybrid app development to be the more genuine app developer.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question