Identifying orphan files in a folder
Posted on 2009-04-02
Supposing you had a folder with say 60,000 files in it. Each file is supposed to relate to an entry in a database table. However the table only has 50,000 rows in it. i.e. there are 10,000 orphan files.
What you want to do is identify and probably delete those 10,000 orphans. What would be the best way of doing it? Ideally something that's reasonably quick; I've tried a few things that work, but take a long time. I usually kill them before they get more than a few hundred files into it, after several minutes. I've tried the following ideas:
1. Do a <cfdirectory> and a database query to get all the filenames. Then do a Query-of-Query to get everything from the cfdirectory WHERE Name NOT IN (#QuotedValueList(dbQuery.filename)#)
2. Do a <cfdirectory> and a database query to get all the filenames. Loop round the cfdirectory, using ListFindNoCase(ValueList(dbQuery.filename), cfdirectory.name) to identify files that aren't in the query.
3. Do a <cfdirectory>, then loop round that, doing a query for each file to see if it's in the database.
Any other suggestions?
At this stage I'd probably want to do it in two steps:
1. Identify the files
2. Do the process again, but this time delete the files. I'd expect step 1 would produce a list of names (write to a file or db table) that I could then loop through to delete.
Database is SQL Server. Using CF 7 on Windows. Willing to consider non-ColdFusion solutions.