Solved

3.5 millions of files to delete with a specific substring

Posted on 2010-08-13
13
382 Views
Last Modified: 2012-05-10
Hi,

I have 3.5 million of files to delete containing the substring ".r13125.ovh.net,S="

How can I delete those ?

Thank you
0
Comment
Question by:matthew016
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 3
  • +2
13 Comments
 
LVL 7

Accepted Solution

by:
jhp333 earned 134 total points
ID: 33428002
Do you mean the string in the filenames?

To list the files:
find . -name '*.r13125.ovh.net,S=*'

To delete them:
find . -name '*.r13125.ovh.net,S=*' -delete

where . is the starting directory. You can use "/" if you want to start from root.
0
 
LVL 9

Author Comment

by:matthew016
ID: 33428041
r13125 ~ # find . -name '*.r13125.ovh.net,S=*' -delete
find: prédicat invalide `-delete'
r13125 ~ #

(in english : invalid predicate)
0
 
LVL 7

Expert Comment

by:jhp333
ID: 33428054
It seems your find utility is old one, not supporting -delete option, which is relatively new. In that case,

find . -name '*.r13125.ovh.net,S=*' -exec rm {} \;
0
Learn by Doing. Anytime. Anywhere.

Do you like to learn by doing?
Our labs and exercises give you the chance to do just that: Learn by performing actions on real environments.

Hands-on, scenario-based labs give you experience on real environments provided by us so you don't have to worry about breaking anything.

 
LVL 9

Author Comment

by:matthew016
ID: 33428149
The file number went down, but only like thousand.
Then error essage :

find: Ne peut faire un clonage (fork).: Ne peut allouer de la mémoire

aprroximate translation from french to english : find: can't clone (fork).: can't allocate memory

I tried to loop thousand times, but after the first error message, the files don't get down anymore.
0
 
LVL 9

Author Comment

by:matthew016
ID: 33428184
I created a file in /home/LIST
with the command :  find . | fgrep 'r13125.ovh.net,S=' > /home/LIST

So I have the list of files to delete (but only filename, without full path, the full path is /home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new)
I heard it was possible to delete all the files listed in LIST file with xargs. How can achieve this exactly ?
0
 
LVL 7

Expert Comment

by:jhp333
ID: 33428340
It seems you have one or more circular links.
Find the link and delete it manually.
0
 
LVL 3

Assisted Solution

by:pitt7
pitt7 earned 133 total points
ID: 33429248
You can use the -H option to prevent following symbolic links (if they are circular).
When using -exec with find with many files you should use the following syntax:

find . -name '*.r13125.ovh.net,S=*' -exec rm {} +

Closing the -exec command with + means that the rm command is called with as many files as argument as the maximum command length size allows.
By using \; for every file a new rm command is invoked which is much slower with many files.

To delete files from a file with xargs:
xargs -a /home/LIST rm
0
 
LVL 7

Expert Comment

by:jhp333
ID: 33431952
@pitt7

"-P     Never follow symbolic links." is the default. I guess his circular links are by hard links.
0
 
LVL 78

Assisted Solution

by:arnold
arnold earned 133 total points
ID: 33432636
The large amount of files will generate errors when using the {}\; grouping.

Using jhp333 find but process one file at atime
find . -name '*.r13125.ovh.net,S=*' | while read a; do
echo Deleting $a
/bin/rm -rf $a
done
0
 
LVL 3

Expert Comment

by:pitt7
ID: 33432724
@jhp333:
Yes, that's right. Using -H is useless here.

@arnold's answer:
I strongly suggest to use a solution that doesn't spawn one rm process for each file. The author says there are 3.5million files this will take a much longer time with one process per file.
If
find . -name '*.r13125.ovh.net,S=*' -exec rm {} +
throws an error too use
find . -name '*.r13125.ovh.net,S=*' | xargs rm

xargs reads arguments from stdin and passes them over to rm. But it does not call rm for every single file but builds a command line with the maximum possible length. This results in much fewer rm processes.

(Side note:
If your filenames can contain \n newlines "find | xargs rm" will fail. In that case use:
find . -name '*.r13125.ovh.net,S=*' -print0 | xargs -0 rm
With this command the arguments are not delimited by a newline but a \0 character which can't be used in filenames.)
0
 
LVL 78

Expert Comment

by:arnold
ID: 33433889
The problem with both {}\; and xargs is that it will try to pass 3.5 million entries to rm which will generate the too many items error for the {}\; I suspect the same thing will happen with xargs.

instead of rm , unlink can be used.

At any one time one rm process will be running per file.

Another option is to use the -mtime as a filter i.e.

find . -name '*.r13125.ovh.net,S=*' -mtime +360 -exec rm {}\;
delete files that match the pattern and are more than a year old. This may reduce the number of files to be deleted per  batch

You can use a for loop to go from 360 to 90 at 5,10,20,30 day steps.

IMHO, since these many files accumulated, the example I posted with the while loop deleting one file at a time, is a good approach.

If you want to get complex i.e. delete 10,20,30 files at a time, you could do build the string that will be passed to rm every X number of files.

0
 
LVL 3

Expert Comment

by:pitt7
ID: 33434328
The error he gets is from find not rm.

find will not pass 3.5 million entries to rm because find can't. The argument length of a program is limited. This is why there are tools like xargs or terminating find -exec with +.

Just run:
find / -exec echo {} \;
[you will see one file per line, which means echo is called for every single file]
and
find / -exec echo {} + | cut -c 80
[you see very much files in one line, but not just one line, echo is called multiple times. of course only if find finds enough files.]
to see the difference (the "| cut -c 80" is to truncates the very long lines generated).
0
 
LVL 3

Assisted Solution

by:stetor
stetor earned 100 total points
ID: 33464406
Hi

I think this is a "just one time" task, so i don't think the time is a problem ...

from the shell prompt type the following in the evening before going to home :
while
  read fname
do
  rm "/home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new/$fname"
done</home/LIST

and the next morning the directory is cleaned ;-)


0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Over the last ten+ years I have seen Linux configuration tools come and go. In the early days there was the tried-and-true, all-powerful linuxconf that many thought would remain the one and only Linux configuration tool until the end of times. Well,…
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question