Link to home
Start Free TrialLog in
Avatar of Kent W
Kent WFlag for United States of America

asked on

Safely deleting millions of files in Windows Server 2003

I have one shot at this, so wanted some expert opinions.  

We have a handful of Windows 2003 Server that we added PHP Fastcgi to a few months back.
The individual doing this forgot to set the php temp file to a known location with a cleanup utility that deletes session and other temp files, and left it in the default C:\windows\temp.

Because of our large traffic volume and many monitors hitting our servers, there were literally millions of files created and not cleaned up over the last couple of months.  One of our upper execs got a little crazy and did a "del *.*" from within the directory.  This churned for a while, and ultimately caused corruption.  Upon reboot, machine was forced into a chkdsk, which, being given a month to run, never completed.  Mainly, it was updating the allocation table index entrys.

Problem - I need a solution to safely delete these millions of files without corrupting the volume that will also update the allocation tables.  We have successfully booted to a rescue CD and forced a deletion of the folder, but since the C was just a mounted non-system volume to the rescue environment, it doesn't update the allocation tables, still resulting in a very long chkdsk run.
Ideas are welcome, and another question I can't seem to find th answer to - we ultimately used an ubuntu rescue CD to not-so-sucessfully fix one server (ended up re-imaging, which was a pain).  Does anyone know if we boot into the Windows 2003 rescue environment, and delete the directory, will that update the actual Windows 2003 install's allocation tables??  That may be the solution, but I'm unsure if it will update the allocation tables.
Avatar of Neil Russell
Neil Russell
Flag of United Kingdom of Great Britain and Northern Ireland image

Any form of recovery disk that mounts an NTFS partition and allows you to delete files from it will update the allocation table. I am unsure as to what you are actually saying.
Of course as you have a disk in an unknown, unstable state, it still needs to run a full chkdsk or simular before it is a safe to use system, regardless of what else you delete.
Avatar of Steve Knight
are they all in one dir or subdirs too?
What s the naming like, i.e. Hex digits randomly, always start the sam, same extension  etc.

We can soon do a loop which will delete a subset at a time, or all of them but one by one.

Having said that del *.* shouldnt have broken it!

E.g. You could do from cmd prompt (use %% from batch file)

for /f "deims=" %%a in ('dir /b /ad a*.*') do @echo del "%%~a"

press ctrlc or ctrl break to stop... That would look for all files starting with a and delete them one by one.  remove the word echo to delete them rather than just show what would delete.

Steve
Avatar of Kent W

ASKER

@dragon-it, they are all in one directory, no subdirs.  They are mostly session files, so a lot of randomly named .tmp files.  Most start with QM*, though, via a standard the programmers enacted.

I'll try your idea out, i have one server we can easily image (exact hardware of the repaired server), our others are very dissimilar hw though, so not so easy if I mess up. :)  Thanks!
SOLUTION
Avatar of Steve Knight
Steve Knight
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Please wait on this one before going any further!
Why :-)
Just writing up my next comment... Please give me a moment longer...
or keep distracting you with "new mail" request !!
Wow! This is a monster of a task....

I did the write-up which I will post in a moment, then I wrote some code (to deomstrate) however, when looking at the code, some failsafety will be evident.

which is why i just suggested identifying the files....

Eg del ??1*.tmp
del??2*.tmp
etc. Depending upon what the names are.

Interested in what you come up with anyway Paul!

Steve
Steve (dragon-it)

Please tread carefully with this one... An entire system's integrity is at stake.

Personally this is what I would advise:

(1) Whether you like it or not you MUST first verify the integrity of the hard drive before continuing. Surely, you must see the sense in doing this. It may take time to complete but it could save a lot of heartache later on.

(2) All the files are contained in a single folder: C:\Windows\temp\ - Great! We know what we're dealing with here.

The asker says there are millions of files.... okay, it's a big one!

(3) Personally, I would prefer to approach this from the command line so I agree with your first train-of-thought.

The files start with 'QM' and have a '.TMP' extension. I would have wanted to know more about the filenames... A small sample of say 3 or 4 randomly chosen filenames might have sufficed.

(4) Grouping the files by name is okay however, you could just as well group them in 1000's.

Grouping by name
If you have say 2 Million files (I'm being conservative here) evenly distributed alphabetically, then you end up with about 77,000 files per group... This is still way too large for my liking.

Grouping by 1000s
A better method would be to just delete 1000 files at a time and pause if necessary (see below) for a few seconds to allows DOS to catch up with it's caching and Windows to adjust to DOS. For this reason, it would be better to close all other apps - especially Explorer.

Suppose there are 2 Million files. That's 2000 lots of a 1000 files. Let's say it takes 10 seconds to delete just a 1000 files, with a 3 second interval after every 1000 files, the total time taken to complete the process would be: 2000 x (10 + 3) secs = 26000sec = 433mins = 7.2hrs.

So, even just 10 seconds to delete 1000 files with a 3 second interval after every 1000 files would take over 7 hours to delete 2,000,000 files!

Each second's delay adds about half an hours processing time.

So, care has to be taken to get this right the first time. See how I implement the delay in the code. This is a very important design consideration which worked well when tested against 100,000 files.

Notice how the delay only kicks in when it is required to do so...  I pride myself on this neat little technique.

(5) One final point. If it's going to take hours long to process millions of files then a visual progress indicator is absolutely VITAL otherwise the user will not know whether the process is still running or not. This can be as simple as diaplaying the number of files remaining.

(6) This is how I would perform the process of deleting MILLIONS of files:

(a) Firstly, find the number of files to delete.

   dir c:\windows\temp\qm*.tmp | find "File(s)" >files.txt

This will contain something along the lines of:

   1234567890 File(s)   123,123,123,123 bytes

We only want the number part so:

   for /f "tokens=1" %%a in (files.txt) do set NumberOfFiles=%%a


(b) Now we need to calculate how many groups of 1000 files there are:

   set /a GroupsOfFiles=%NumberOfFiles% / 1000


(c) We need a container loop. Something like this:

   :loop
      :
   if exist c:\windows\temp\qm*.tmp goto :loop


(d) Then we need a core loop again, something like this:

   set count=1000
   for %%a in (c:\windows\temp\qm*.tmp) do (
      if exist "%%~fa" (
         del /f "%%~fa"
         set /a count-=1
      )
      if !count! equ 0 goto :exit-for
   )
   :exit-for

(e) We can now add our visual progress indicator. The fastest and easiest way to do this would be to use TITLE. We need to display a countdown of the group number. ie:

   set /a NumberOfFiles-=1
   :
   title !NumberOfFiles!

For added measure, we could add the filename like this:

   title !NumberOfFiles! - %%~nxa

(8) A couple of points to note here. When I ran the code on 100,000 files, I was getting frequent "Cannont find... filename" error messages.

By including a 3 second delay (ping for 3 seconds) after every 1000 files, I was able to eliminate these errors

Another thought was to just redirect the error messages to NUL as clearly they did not effect the end result.

However, I decided to pretest for the error by including: if exist "%%~fa"... and this did the trick quite nicely. But this was inefficient. I found these error messages were sporadic and so I needed a way to include the delay only when an error actually occured.

(9) Also, I felt a need for a failsafe method to abort the batch file. Simply relying on CTRL-C wasn't good enough and at times hit-or-miss (mostly miss) therefore, I included a sentinel file for ensuring the batch file continues running so long as the sentinel file is present.

So, if we have a run-away loop where CTRL-C fails to bring it to a stop then merely opening up another command prompt (or using Windows Explorer), the user can simply delete the file SENTINEL and this will force the batch file to stop immediately.

I hope you can see the sense in this and hopefully we'll see more of this in the future.

(8) Anyway, here's the whole thing in a nutshell. It's a work of art!...

@echo off
setlocal enabledelayedexpansion

del files.txt 2>nul
del sentinel 2>nul

if not exist C:\windows\temp\qm*.tmp (
   echo.
   echo There are no 'C:\Windows\Temp\QM*.TMP' files to delete.
   goto :eof
)

cls
set /p .=Finding number of files to delete: <nul

dir c:\windows\temp\qm*.tmp | find "File(s)" >files.txt
for /f "tokens=1" %%a in (files.txt) do set NumberOfFiles=%%a

echo %NumberOfFiles%
echo.

set /a GroupsOfFiles=%NumberOfFiles% / 1000
echo This batch process will delete files in groups of 1000.
echo There are %GroupsOfFiles% groups of 1000 files to delete.
echo.

echo It may take several hours to delete %NumberOfFiles% files.
echo.
echo Progress will be shown in the window's title bar above.
echo.
echo You may abort this process at any time by pressing CTRL-C.
echo.
echo If CTR-L fails, using Explorer or another DOS prompt,
echo delete the file SENTINEL by entering the following command:
echo.
echo    DEL sentinel
echo.
echo and resume by running this batch file at a later date.
echo.

set yn=
set /p yn=Would you like to continue now [Y/N]: 
if /i not "%yn%"=="Y" (
   echo You have chosen not to continue at this moment.
   echo.
   goto :eof
)

echo.
set /p .=Deleting files...<nul

copy /y nul sentinel >nul

:loop
   set count=1000
	
   for %%a in (c:\windows\temp\qm*.tmp) do (
      title !NumberOfFiles! - %%~nxa
      if not exist sentinel goto :eof

      if exist "%%~fa" (
         del /f "%%~fa"
         set /a count-=1
         set /a NumberOfFiles-=1
      ) else (
         ping -n 3 -w 1000 127.0.0.1 >nul
      )

      if !count! equ 0 goto :exit-for
   )
   :exit-for
	
if exist c:\windows\temp\qm*.tmp goto :loop

title 0
echo Done!
echo.

del files.txt 2>nul
del sentinel 2>nul

Open in new window

Please let me know what you make of the code.
Avatar of Kent W

ASKER

Thanks everyone, I've fought one server  and well saw what could happen in the worst scenario. I had to attend to my wife who is very pregnant and almost due, I'll be testing some of this in the next couple of days.
I can't test because I've already repaired the other server, but I'm thinking that it may also be possible to boot into recovery console, rename the dir containing the files, re-create it, then go about deleting the full dir. I was able to do this instantly in a Linux rescue shell, but, of course, it didn't update allocation tables, which means the next chkdsk is going to take an eternity.
Windows file handling leaves something to be desired.
You can run this code quite safely any time.... There's the option not to continue if you so desire...

@echo off
setlocal enabledelayedexpansion

del files.txt 2>nul
del sentinel 2>nul

if not exist C:\windows\temp\qm*.tmp (
   echo.
   echo There are no 'C:\Windows\Temp\QM*.TMP' files to delete.
   goto :eof
)

cls
set /p .=Finding number of files to delete: <nul

dir c:\windows\temp\qm*.tmp | find "File(s)" >files.txt
for /f "tokens=1" %%a in (files.txt) do set NumberOfFiles=%%a

echo %NumberOfFiles%
echo.

set /a GroupsOfFiles=%NumberOfFiles% / 1000
echo This batch process will delete files in groups of 1000.
echo There are %GroupsOfFiles% groups of 1000 files to delete.
echo.

echo It may take several hours to delete %NumberOfFiles% files.
echo.
echo Progress will be shown in the window's title bar above.
echo.
echo You may abort this process at any time by pressing CTRL-C.
echo.
echo If CTR-L fails, using Explorer or another DOS prompt,
echo delete the file SENTINEL by entering the following command:
echo.
echo    DEL sentinel
echo.
echo and resume by running this batch file at a later date.
echo.

set yn=
set /p yn=Would you like to continue now [Y/N]: 
if /i not "%yn%"=="Y" (
   echo You have chosen not to continue at this moment.
   echo.
   goto :eof
)

echo.
set /p .=Deleting files...<nul

copy /y nul sentinel >nul

:loop
   set count=1000
	
   for %%a in (c:\windows\temp\qm*.tmp) do (
      title !NumberOfFiles! - %%~nxa
      if not exist sentinel goto :eof

      if exist "%%~fa" (
         del /f "%%~fa"
         set /a count-=1
         set /a NumberOfFiles-=1
      ) else (
         ping -n 3 -w 1000 127.0.0.1 >nul
      )

      if !count! equ 0 goto :exit-for
   )
   :exit-for
	
if exist c:\windows\temp\qm*.tmp goto :loop

title 0
echo Done!
echo.

del files.txt 2>nul
del sentinel 2>nul

Open in new window

Good one Paul.  I think it is VERY conservative only expecting it to delete 1000 files at a time... 100's of 1000's really isn't an issue? BUT who cares if it gets the job done!

Some quick test files made in one dir of 100,000 files took some 50 seconds to make

rd dirname /s/q took about 20-25 seconds to remove them all
del *.* took about 18-20 secs.

Good luck anyway ... files and baby!

Steve
mugojava

I do not want to receive points for my contributions above. dragon-it was on the right tracks before I joined this thread. He also waited patiently for my reply while he could have contributed further himeself.

Please bear this in mind when closing the question.


Steve

I agree with you on DEL *.*, so why didn't we just suggest that in the first place? Same as RD dirname. Especially as it took less than a minute for 100,000 files.

As programmers we tend to focus programatically and in this case, overlook the obvious alternative choices.

I was dead beat when I posted my last two or so comments. The 1000-file thing is a remnant left over from an earlier attempt. I forgot to edit it (and all references to it) out. in fact, I'm pretty sure we could do without the container loop altogether as originally, the PING command sat between the core loop and this outer loop (extreme tiredness does this at times).

I think the asker tried DEL *.*:

   >> "...did a "del *.*" from within the directory.  This churned for a while, and ultimately caused corruption..."

Anyway, I was more concerned about the asker corrupting his hard drive than anything else. As for points, it's not what I'm after - so if you earn them, you're welcome to them. Toward the end, I guess I became obsessive.

Paul... don't be mad.... good solution of yours, just I think you could up the 1000 to, well lets say 10,000 easily.

I wonder whether the del *.* if left would have just completed anyway, although taking maybe an hour or whatever and stopping the machine in the middle caused the issues?

I don't see why it shouldn't have, so the liklihood of the drive being corrupted may be an issues.

I suppose also 1 million decent sized files might take longer to delete the the 100,000 3 byte files I created with:

for /l %a in (1,1,100000) do @ echo.>%a.txt

Steve
Steve

It's amazing what sleep can do to the body and mind. And of course I agree with everything you just said.

Increasing files from 1000 to 10,000 would not really be an issue because I'd do away with the outer loop altogether and just process the entire millions of files using the 'IF EXIST...' and 'delay-on-demand' technique built into the inner loop.


I think you are definitely right with the DEL *.* - aborting this, or posibly even rebooting a seemingly 'hung' PC, can have devestating consequences. And as you rightly point out, there might already have been corruption present.

At least your files had some 'meat' to them! Mine were:

   COPY NUL !random!-!random!-!random!.TMP > NUL

In other words - all zero sized!

Avatar of Kent W

ASKER

Thanks guys, playing some catch up and going to do some enacting / testing here. To answer some questions -
One of our programmers (the actual culprit...) did try a Del *.* from within the /temp directory, which, after a couple of hours, popped up a corruption notice and stopped the DEL.  Upon reboot, we were forced into a chkdsk, which, because apparently whatever DID get deleted didn't get updated in the MFT, was allowed to run for over 30 days And since there is no progress indicator, we had no way of knowing if it was 2% done or 90% done.  The decision was made to yank it and try another method (We have another exact machine matching this one we took an image from).  
In testing, we were able to boot into a Linux rescue disk, and remove the /temp directory quite easily (rmdir ).  The problem was, this does not update NTFS's mater file table, so, we were back with the "removing indexes" in the subsequent check disk.  
The decision was made to just re-image (from an old image they were originally pushed out from) and get it back in service, so I was not able to try booting into Windows Server rescue console and trying the same thing...I have a feeling this may be able to update the MFT, which would actually achieve what we are trying to accomplish - remove the files, and update the MFT.
I'm going to bring up another server and do some testing, we do have the other "fixed" machine we can image and push out to one other server (since they match), but, the other two that need this repair also are NON-matching in hardware, so unfortunately pushing a new image out to all of them is not an option.  

I'm very hesitant to do any dir /temp > files.txt or anything that extracts file info from the directory for fear of it hanging / corrupting and causing us to reboot, thereby forcing a check disk, which would effectively remove it from our web app pool.  

Thanks, I'm going to do some testing!  Good info here. :)
I would suggest going to the dir and typing

dir /b

And see what you get.  You can stop at any time with control c or Control-Break and it should respond easily to that.

If it can do a dir for that long, try piping it into a file.  It isn't changing anything at that stage, write it to another disc if worried for any reason.  You can still stop a piping to a file dir command at any time too and inspect the no. of lines in the text file, e.g.

find /c /v "#NO#" file.txt

I assume there is no anti-virus on-access scanners and the like getting in the way on the box... that could easily take 10-100 times longer!

If it pipes into a file alright then any of the for loop suggestions will be fine... Pauls' for instance effectively does a dir like this then works down them 1000 at a time asking them one by one to delete.  At any time you can stop it with a control c etc.

If you have the option of a machine can be imaged, tested and put back I would be very tempted to do:

cd \temp
rd . /s/q

As that will delete all the files and dirs in one go, but of course only do this if you can on a easily put-back-able server!

Steve
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry Steve, we seem to crossed posts again...

THREE really great pieces of advice from Steve.

   (1) Perform 'DIR /B' and see if it completes...

   (2) Piping output to a file on a DIFFERENT drive (and using FIND).
        This is so that potentially harmful changes aren't made to the current hard drive.

   (3) Turning off any anti-virus (and other (especially disk) monitoring) software.
         Also, close all apps (and non-essential processes).

I have re-written the code (above) so that it nolonger deletes files in groups of 1000 at a time. This is made possible by including a conditional test coupled with a 3 second delay to allow the system to update from it's cache on the fly.

I eagerly wait to hear back from mugojava regarding the effectiveness of this technique...

(Also, please report back any problems)
Avatar of Kent W

ASKER

Testing these first of the work week, and will post results.
Avatar of Kent W

ASKER

Paul,
I'm about to try this out.  In the older revision, looks like you had this set to del in groups of 1000.  I'm not seeing that so blatantly in the last revision of the code.  Will it still group and del in batches, or just run as far as it can until, if needed, it has to be killed, the sentinel file deleted (optionally) and re-ran?

Thanks!
It should run without probs...

The deletes are nolonger grouped into 1000 files.

I have added a preconditional delete, if it detects it might fail at the delete, it introduces a small delay so that the file system can catch up with the execution of the code...

Let's hope it works!

REMEMBER: Run the batch file in a DOS box (not full-screen). This will enable you to see it actually working away as it displays each file's filename in the titlebar of the window.

Only if you see no activity in the titlebar for more than 5 or so seconds should you try to abort the batch file (by deleting the SENTINEL file as a last resort).

Good luck!
BTW, you can quit the batch file at any time you want... because, when you restart it again, it just carries on from where it left off... Nifty, eh?

Also, if you want to decrease the delay from 3 seconds to 2 seconds then you can edit line 59:

   ping -n 3 -w 1000 127.0.0.1 >nul

to:

   ping -n 2 -w 1000 127.0.0.1 >nul

Please report back with any progress....
Avatar of Kent W

ASKER

Simply awesome. Thanks Paul!  I'll be sure to post how it's going.  Appreciate your help. :)
Avatar of Kent W

ASKER

I got a chance to run and test, and the script worked fine.  However, I instantly got a corrupt message from cmd.exe, so, I don't think there was any way around it due to how hosed the files had become.
I tested the script generating thousands of files then deleting, and it worked perfectly, so, no fault at all there.  The FS was already a gonner.  
Thank you all for your help, I'll keep these around and put them to good use in the future.  Excellent for non-invasive cleanup procedures.
Thanks for the assist, we tried anyway!

Steve