Solved

3.5 millions of files to delete with a specific substring

Posted on 2010-08-13
13
370 Views
Last Modified: 2012-05-10
Hi,

I have 3.5 million of files to delete containing the substring ".r13125.ovh.net,S="

How can I delete those ?

Thank you
0
Comment
Question by:matthew016
  • 4
  • 3
  • 3
  • +2
13 Comments
 
LVL 7

Accepted Solution

by:
jhp333 earned 134 total points
ID: 33428002
Do you mean the string in the filenames?

To list the files:
find . -name '*.r13125.ovh.net,S=*'

To delete them:
find . -name '*.r13125.ovh.net,S=*' -delete

where . is the starting directory. You can use "/" if you want to start from root.
0
 
LVL 9

Author Comment

by:matthew016
ID: 33428041
r13125 ~ # find . -name '*.r13125.ovh.net,S=*' -delete
find: prédicat invalide `-delete'
r13125 ~ #

(in english : invalid predicate)
0
 
LVL 7

Expert Comment

by:jhp333
ID: 33428054
It seems your find utility is old one, not supporting -delete option, which is relatively new. In that case,

find . -name '*.r13125.ovh.net,S=*' -exec rm {} \;
0
 
LVL 9

Author Comment

by:matthew016
ID: 33428149
The file number went down, but only like thousand.
Then error essage :

find: Ne peut faire un clonage (fork).: Ne peut allouer de la mémoire

aprroximate translation from french to english : find: can't clone (fork).: can't allocate memory

I tried to loop thousand times, but after the first error message, the files don't get down anymore.
0
 
LVL 9

Author Comment

by:matthew016
ID: 33428184
I created a file in /home/LIST
with the command :  find . | fgrep 'r13125.ovh.net,S=' > /home/LIST

So I have the list of files to delete (but only filename, without full path, the full path is /home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new)
I heard it was possible to delete all the files listed in LIST file with xargs. How can achieve this exactly ?
0
 
LVL 7

Expert Comment

by:jhp333
ID: 33428340
It seems you have one or more circular links.
Find the link and delete it manually.
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 3

Assisted Solution

by:pitt7
pitt7 earned 133 total points
ID: 33429248
You can use the -H option to prevent following symbolic links (if they are circular).
When using -exec with find with many files you should use the following syntax:

find . -name '*.r13125.ovh.net,S=*' -exec rm {} +

Closing the -exec command with + means that the rm command is called with as many files as argument as the maximum command length size allows.
By using \; for every file a new rm command is invoked which is much slower with many files.

To delete files from a file with xargs:
xargs -a /home/LIST rm
0
 
LVL 7

Expert Comment

by:jhp333
ID: 33431952
@pitt7

"-P     Never follow symbolic links." is the default. I guess his circular links are by hard links.
0
 
LVL 77

Assisted Solution

by:arnold
arnold earned 133 total points
ID: 33432636
The large amount of files will generate errors when using the {}\; grouping.

Using jhp333 find but process one file at atime
find . -name '*.r13125.ovh.net,S=*' | while read a; do
echo Deleting $a
/bin/rm -rf $a
done
0
 
LVL 3

Expert Comment

by:pitt7
ID: 33432724
@jhp333:
Yes, that's right. Using -H is useless here.

@arnold's answer:
I strongly suggest to use a solution that doesn't spawn one rm process for each file. The author says there are 3.5million files this will take a much longer time with one process per file.
If
find . -name '*.r13125.ovh.net,S=*' -exec rm {} +
throws an error too use
find . -name '*.r13125.ovh.net,S=*' | xargs rm

xargs reads arguments from stdin and passes them over to rm. But it does not call rm for every single file but builds a command line with the maximum possible length. This results in much fewer rm processes.

(Side note:
If your filenames can contain \n newlines "find | xargs rm" will fail. In that case use:
find . -name '*.r13125.ovh.net,S=*' -print0 | xargs -0 rm
With this command the arguments are not delimited by a newline but a \0 character which can't be used in filenames.)
0
 
LVL 77

Expert Comment

by:arnold
ID: 33433889
The problem with both {}\; and xargs is that it will try to pass 3.5 million entries to rm which will generate the too many items error for the {}\; I suspect the same thing will happen with xargs.

instead of rm , unlink can be used.

At any one time one rm process will be running per file.

Another option is to use the -mtime as a filter i.e.

find . -name '*.r13125.ovh.net,S=*' -mtime +360 -exec rm {}\;
delete files that match the pattern and are more than a year old. This may reduce the number of files to be deleted per  batch

You can use a for loop to go from 360 to 90 at 5,10,20,30 day steps.

IMHO, since these many files accumulated, the example I posted with the while loop deleting one file at a time, is a good approach.

If you want to get complex i.e. delete 10,20,30 files at a time, you could do build the string that will be passed to rm every X number of files.

0
 
LVL 3

Expert Comment

by:pitt7
ID: 33434328
The error he gets is from find not rm.

find will not pass 3.5 million entries to rm because find can't. The argument length of a program is limited. This is why there are tools like xargs or terminating find -exec with +.

Just run:
find / -exec echo {} \;
[you will see one file per line, which means echo is called for every single file]
and
find / -exec echo {} + | cut -c 80
[you see very much files in one line, but not just one line, echo is called multiple times. of course only if find finds enough files.]
to see the difference (the "| cut -c 80" is to truncates the very long lines generated).
0
 
LVL 3

Assisted Solution

by:stetor
stetor earned 100 total points
ID: 33464406
Hi

I think this is a "just one time" task, so i don't think the time is a problem ...

from the shell prompt type the following in the evening before going to home :
while
  read fname
do
  rm "/home/vpopmail/domains/r13125.ovh.net/postmaster/Maildir/new/$fname"
done</home/LIST

and the next morning the directory is cleaned ;-)


0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now