Link to home
Start Free TrialLog in
Avatar of Mark Holbert
Mark Holbert

asked on

Cleanup Premature Application GoLive

I need to clean up the results of a premature move to production of a small application I built.  Due to a few minor logical errors and an overly aggressive timeline, the errors didn't become apparent until the application was put fully into production, and so on.
The application parses medical transcription files for insertion of records into a SQL Server DB and storage on a File Server.  
I need to accomplish the following tasks.
1.  A number of records have been inserted into the database without the corresponding files being moved into the storage directory.  When accessed in the final product, they appear normal until loaded and then error out.  The files are stored and referenced as GUID.doc or GUID.rtf.  
I need to be able to search the file server and return the database records that are orphaned from files.
2.  A number of files have been moved into the storage directory, without the requisite records being inserted into the DB, or with erroneous records inserted.
I have cleaned up the erroneous records, and now need to address the orphaned files in the storage directory.  The files need to be archived to a backup directory for possible manual review.
With any luck in the end the DB and Storage will be in sync and all will be happy.
I am proficient in SQL inside of the DB but have never addressed files at the storage level via SQL and have next to no idea how to proceed.

Any and all help is appreciated.  Thanks in advance for the energy expended.

I have not read the full Posting Rules, and presume it is inappropriate to offer payment for services.  On the chance I am wrong; I have no issues with paying someone for their time.
Avatar of Kevin Cross
Kevin Cross
Flag of United States of America image

Hi.

What programming (scripting) languages are you comfortable with? First thought is to use a quick script of the file system to generate a list of ALL files with .doc and .rtf extension that are in the specified directory. Whether or not this is recursive, i.e., you have subdirectories, will be good to note. Additionally, you will want to check if there are any files that do not belong in the directory or don't meet the standard format.

If all is well, you can have the output of the above be a CSV or XLS file that has one column of filenames.

This can be imported into SQL (or using OPENROWSET for example), then you can do a standard SQL INSERT or DELETE based on whether filename EXISTS in table or not.
Avatar of Mark Holbert
Mark Holbert

ASKER

I'm not particularly versed in any of the scripting languages, but am most familiar with command line stuff and .bat files.

I can collect the records at the database level but don't script well enough to collect the storage structure information, and also don't have experience with sql interacting with external stuff.

if i can get the list of files in the storage structure into a table i should be able to take it from there mostly.  the only interaction with the storage structure beyond that would be the final MOVE statement to move the orphaned files to the archive directory.  

Hope this helps.  MH
Okay, here is a simple example using DIR command line.

Get files of a specific extension in a directory and all its subdirectories and put output in a new file.
dir c:\temp\*.doc /s/b > c:\temp\docfiles.csv

Open in new window


Example contents of CSV:
c:\temp\962B4D49-D734-4B81-B08A-AA35C97FDBE2.doc
c:\temp\99EB03CB-3842-49A1-87F2-F6E6D11A72E3.doc

Open in new window


Pinal Dave, SQL Server MVP, has a blog entry on using BULK import CSV:
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
(I like it because it shows Excel import using OPENROWSET also if you scroll through comments)
CREATE TABLE #FileNames ([FileName] VARCHAR(255));

BULK
INSERT #FileNames
FROM 'c:\temp\docfiles.csv'
WITH
(
ROWTERMINATOR = '\n'
);
GO

/*
GUIDS
-------------------------------------
962B4D49-D734-4B81-B08A-AA35C97FDBE2
99EB03CB-3842-49A1-87F2-F6E6D11A72E3
*/
SELECT REPLACE(RIGHT([FileName], 
                     CHARINDEX('\', REVERSE([FileName]))-1), 
               '.doc', '') AS [GUIDS]
FROM #FileNames;
GO

DROP TABLE #FileNames;
GO

Open in new window


And really, you do not even have to script that. You can use DIR to create both a docfiles.csv and a rtffiles.csv and then right-click on your database and select Tasks > Import Data... and follow the SQL Server Import and Export Wizard for Flat File Source.

The T-SQL I showed above gives you an example of how you can strip the file path and document extension to get just the GUID, although you may not need to if you do your comparison with database as [FileName] LIKE '%\' + CONVERT(CHAR(36), [YourGUIDColumn]) + '.%'. You should be good from there other than exporting back to a CSV.

The SQL command line utilities such as bcp may come in handy for that part. Once you have your list of files imported. You can run a query that checks for files that exist on the file system and not in the database. You can use the tools from SQL to get this output into a CSV.
http://msdn.microsoft.com/en-us/library/ms162802.aspx

Using this CSV, you can probably use VBScript or PowerShell to iterate over the CSV contents and move files accordingly. I have not tried or at least do not remember doing this with a DOS batch file, but it probably has some capabilities to do this also.

Hope that helps!

Kevin
You may find this interesting also. If you are going to use the bcp and other utilities from T-SQL, you have to enable xp_cmdshell http://msdn.microsoft.com/en-us/library/ms175046(v=sql.90).aspx

With xp_cmdshell enabled, you can actually do this in one procedure.
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'xp_cmdshell', 1;
RECONFIGURE;
GO

-- regular #temp_tables do not work; needs a physical table/view
-- ##global_temp_tables can be used or TempDB..TempTable
CREATE TABLE ##FileNames ([FileName] VARCHAR(255));
GO

DECLARE @dircmd VARCHAR(100);

-- use same DIR command to get list of files
SET @dircmd = 'dir c:\temp\*.doc /s/b';
INSERT INTO ##FileNames 
EXEC xp_cmdshell @dircmd;

SET @dircmd = 'dir c:\temp\*.rtf /s/b';
INSERT INTO ##FileNames 
EXEC xp_cmdshell @dircmd;

-- do your processing: simple select shown here
SELECT [FileName]
FROM ##FileNames 
-- if dir does not find any files, result is 'File Not Found'
WHERE [FileName] <> 'File Not Found'; 
GO

-- export some results, using bcp
EXEC xp_cmdshell 'bcp "SELECT [FileName] FROM ##FileNames WHERE [FileName] <> ''File Not Found'';" queryout c:\temp\files2arch.csv -T -c';
GO 

DROP TABLE ##FileNames;
GO

-- disable for security reasons when done (optional)
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'xp_cmdshell', 0;
RECONFIGURE;

Open in new window

Awesome, i wil play through it and let you know.  I very much appreciate the help.  
ASKER CERTIFIED SOLUTION
Avatar of Kevin Cross
Kevin Cross
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
i am cleaning everything up and going on the thought that i can at some point write the files that need deleting at the folder level into a file and script the deletion.

I ended up importing the data via the wizard and i guess i can do the same on the way out.

i appreciate the help.

Mark