• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 393
  • Last Modified:

3.5 millions of files to delete with a specific substring


I have 3.5 million of files to delete containing the substring ".r13125.ovh.net,S="

How can I delete those ?

Thank you
  • 4
  • 3
  • 3
  • +2
4 Solutions
Do you mean the string in the filenames?

To list the files:
find . -name '*.r13125.ovh.net,S=*'

To delete them:
find . -name '*.r13125.ovh.net,S=*' -delete

where . is the starting directory. You can use "/" if you want to start from root.
matthew016Author Commented:
r13125 ~ # find . -name '*.r13125.ovh.net,S=*' -delete
find: prédicat invalide `-delete'
r13125 ~ #

(in english : invalid predicate)
It seems your find utility is old one, not supporting -delete option, which is relatively new. In that case,

find . -name '*.r13125.ovh.net,S=*' -exec rm {} \;
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

matthew016Author Commented:
The file number went down, but only like thousand.
Then error essage :

find: Ne peut faire un clonage (fork).: Ne peut allouer de la mémoire

aprroximate translation from french to english : find: can't clone (fork).: can't allocate memory

I tried to loop thousand times, but after the first error message, the files don't get down anymore.
matthew016Author Commented:
I created a file in /home/LIST
with the command :  find . | fgrep 'r13125.ovh.net,S=' > /home/LIST

So I have the list of files to delete (but only filename, without full path, the full path is /home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new)
I heard it was possible to delete all the files listed in LIST file with xargs. How can achieve this exactly ?
It seems you have one or more circular links.
Find the link and delete it manually.
You can use the -H option to prevent following symbolic links (if they are circular).
When using -exec with find with many files you should use the following syntax:

find . -name '*.r13125.ovh.net,S=*' -exec rm {} +

Closing the -exec command with + means that the rm command is called with as many files as argument as the maximum command length size allows.
By using \; for every file a new rm command is invoked which is much slower with many files.

To delete files from a file with xargs:
xargs -a /home/LIST rm

"-P     Never follow symbolic links." is the default. I guess his circular links are by hard links.
The large amount of files will generate errors when using the {}\; grouping.

Using jhp333 find but process one file at atime
find . -name '*.r13125.ovh.net,S=*' | while read a; do
echo Deleting $a
/bin/rm -rf $a
Yes, that's right. Using -H is useless here.

@arnold's answer:
I strongly suggest to use a solution that doesn't spawn one rm process for each file. The author says there are 3.5million files this will take a much longer time with one process per file.
find . -name '*.r13125.ovh.net,S=*' -exec rm {} +
throws an error too use
find . -name '*.r13125.ovh.net,S=*' | xargs rm

xargs reads arguments from stdin and passes them over to rm. But it does not call rm for every single file but builds a command line with the maximum possible length. This results in much fewer rm processes.

(Side note:
If your filenames can contain \n newlines "find | xargs rm" will fail. In that case use:
find . -name '*.r13125.ovh.net,S=*' -print0 | xargs -0 rm
With this command the arguments are not delimited by a newline but a \0 character which can't be used in filenames.)
The problem with both {}\; and xargs is that it will try to pass 3.5 million entries to rm which will generate the too many items error for the {}\; I suspect the same thing will happen with xargs.

instead of rm , unlink can be used.

At any one time one rm process will be running per file.

Another option is to use the -mtime as a filter i.e.

find . -name '*.r13125.ovh.net,S=*' -mtime +360 -exec rm {}\;
delete files that match the pattern and are more than a year old. This may reduce the number of files to be deleted per  batch

You can use a for loop to go from 360 to 90 at 5,10,20,30 day steps.

IMHO, since these many files accumulated, the example I posted with the while loop deleting one file at a time, is a good approach.

If you want to get complex i.e. delete 10,20,30 files at a time, you could do build the string that will be passed to rm every X number of files.

The error he gets is from find not rm.

find will not pass 3.5 million entries to rm because find can't. The argument length of a program is limited. This is why there are tools like xargs or terminating find -exec with +.

Just run:
find / -exec echo {} \;
[you will see one file per line, which means echo is called for every single file]
find / -exec echo {} + | cut -c 80
[you see very much files in one line, but not just one line, echo is called multiple times. of course only if find finds enough files.]
to see the difference (the "| cut -c 80" is to truncates the very long lines generated).

I think this is a "just one time" task, so i don't think the time is a problem ...

from the shell prompt type the following in the evening before going to home :
  read fname
  rm "/home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new/$fname"

and the next morning the directory is cleaned ;-)

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 4
  • 3
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now