3.5 millions of files to delete with a specific substring

Hi,

I have 3.5 million of files to delete containing the substring ".r13125.ovh.net,S="

How can I delete those ?

Thank you
LVL 9
matthew016Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jhp333Commented:
Do you mean the string in the filenames?

To list the files:
find . -name '*.r13125.ovh.net,S=*'

To delete them:
find . -name '*.r13125.ovh.net,S=*' -delete

where . is the starting directory. You can use "/" if you want to start from root.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
matthew016Author Commented:
r13125 ~ # find . -name '*.r13125.ovh.net,S=*' -delete
find: prédicat invalide `-delete'
r13125 ~ #

(in english : invalid predicate)
0
jhp333Commented:
It seems your find utility is old one, not supporting -delete option, which is relatively new. In that case,

find . -name '*.r13125.ovh.net,S=*' -exec rm {} \;
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

matthew016Author Commented:
The file number went down, but only like thousand.
Then error essage :

find: Ne peut faire un clonage (fork).: Ne peut allouer de la mémoire

aprroximate translation from french to english : find: can't clone (fork).: can't allocate memory

I tried to loop thousand times, but after the first error message, the files don't get down anymore.
0
matthew016Author Commented:
I created a file in /home/LIST
with the command :  find . | fgrep 'r13125.ovh.net,S=' > /home/LIST

So I have the list of files to delete (but only filename, without full path, the full path is /home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new)
I heard it was possible to delete all the files listed in LIST file with xargs. How can achieve this exactly ?
0
jhp333Commented:
It seems you have one or more circular links.
Find the link and delete it manually.
0
pitt7Commented:
You can use the -H option to prevent following symbolic links (if they are circular).
When using -exec with find with many files you should use the following syntax:

find . -name '*.r13125.ovh.net,S=*' -exec rm {} +

Closing the -exec command with + means that the rm command is called with as many files as argument as the maximum command length size allows.
By using \; for every file a new rm command is invoked which is much slower with many files.

To delete files from a file with xargs:
xargs -a /home/LIST rm
0
jhp333Commented:
@pitt7

"-P     Never follow symbolic links." is the default. I guess his circular links are by hard links.
0
arnoldCommented:
The large amount of files will generate errors when using the {}\; grouping.

Using jhp333 find but process one file at atime
find . -name '*.r13125.ovh.net,S=*' | while read a; do
echo Deleting $a
/bin/rm -rf $a
done
0
pitt7Commented:
@jhp333:
Yes, that's right. Using -H is useless here.

@arnold's answer:
I strongly suggest to use a solution that doesn't spawn one rm process for each file. The author says there are 3.5million files this will take a much longer time with one process per file.
If
find . -name '*.r13125.ovh.net,S=*' -exec rm {} +
throws an error too use
find . -name '*.r13125.ovh.net,S=*' | xargs rm

xargs reads arguments from stdin and passes them over to rm. But it does not call rm for every single file but builds a command line with the maximum possible length. This results in much fewer rm processes.

(Side note:
If your filenames can contain \n newlines "find | xargs rm" will fail. In that case use:
find . -name '*.r13125.ovh.net,S=*' -print0 | xargs -0 rm
With this command the arguments are not delimited by a newline but a \0 character which can't be used in filenames.)
0
arnoldCommented:
The problem with both {}\; and xargs is that it will try to pass 3.5 million entries to rm which will generate the too many items error for the {}\; I suspect the same thing will happen with xargs.

instead of rm , unlink can be used.

At any one time one rm process will be running per file.

Another option is to use the -mtime as a filter i.e.

find . -name '*.r13125.ovh.net,S=*' -mtime +360 -exec rm {}\;
delete files that match the pattern and are more than a year old. This may reduce the number of files to be deleted per  batch

You can use a for loop to go from 360 to 90 at 5,10,20,30 day steps.

IMHO, since these many files accumulated, the example I posted with the while loop deleting one file at a time, is a good approach.

If you want to get complex i.e. delete 10,20,30 files at a time, you could do build the string that will be passed to rm every X number of files.

0
pitt7Commented:
The error he gets is from find not rm.

find will not pass 3.5 million entries to rm because find can't. The argument length of a program is limited. This is why there are tools like xargs or terminating find -exec with +.

Just run:
find / -exec echo {} \;
[you will see one file per line, which means echo is called for every single file]
and
find / -exec echo {} + | cut -c 80
[you see very much files in one line, but not just one line, echo is called multiple times. of course only if find finds enough files.]
to see the difference (the "| cut -c 80" is to truncates the very long lines generated).
0
stetorCommented:
Hi

I think this is a "just one time" task, so i don't think the time is a problem ...

from the shell prompt type the following in the evening before going to home :
while
  read fname
do
  rm "/home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new/$fname"
done</home/LIST

and the next morning the directory is cleaned ;-)


0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.