texasreddog
asked on
getting around the rsync/cp/ls Argument list too long limitation
I've been using a perl script that allows me to run an rsync command to copy jpg image files from one location to another, but it takes freakin' forever due to the pattern matching. I have to do this because of the "Argument list too long" error that happens when you try to copy a large group of files at one time.
Does anyone know if there is a better way to get around this problem with perl, or even a shell script? I was thinking of using the File::Copy module, but I'd probably run into the same problem. Here is the code that I currently use:
#!/usr/bin/perl
open(INPUT, "<ANFTP_Paths.txt");
open STDERR, ">>ANFTPSync_ErrorLog";
open STDOUT, ">>ANFTPSync_StdoutLog";
my @guidvar1=("0","1","2","3" ,"4","5"," 6","7","8" ,"9","A"," B","C","D" ,"E","F");
my @guidvar2=("0","1","2","3" ,"4","5"," 6","7","8" ,"9","A"," B","C","D" ,"E","F");
my @guidvar3=("0","1","2","3" ,"4","5"," 6","7","8" ,"9","A"," B","C","D" ,"E","F");
my @guidvar4=("0","1","2","3" ,"4","5"," 6","7","8" ,"9","A"," B","C","D" ,"E","F");
while (<INPUT>) {
my ($line) = $_;
chomp($line);
$new = substr( $line, index( $_, "\," ) + 1 );
$old = substr( $line, 0, index( $_, "\," ) );
unless(-d "$old/images2") {
exec `mv $old/images $old/images2`;
}
unless(-l "$old/images") {
exec `ln -s $new/images $old/images`;
exec `chmod 555 $new/images`;
}
for($key1=0; $key1<=$#guidvar1; $key1++) {
for($key2=0; $key2<=$#guidvar2; $key2++) {
for($key3=0; $key3<=$#guidvar3; $key3++) {
for($key4=0; $key4<=$#guidvar4; $key4++) {
my $command = "sudo rsync -gopt $old/images2/$guidvar1[$ke y1]$guidva r2[$key2]$ guidvar3[$ key3]$guid var4[$key4 ]* $new/images\n";
print STDOUT, $command;
exec `$command`;
}
}
}
}
}
close(INPUT);
close(STDERR);
close(STDOUT);
Does anyone know if there is a better way to get around this problem with perl, or even a shell script? I was thinking of using the File::Copy module, but I'd probably run into the same problem. Here is the code that I currently use:
#!/usr/bin/perl
open(INPUT, "<ANFTP_Paths.txt");
open STDERR, ">>ANFTPSync_ErrorLog";
open STDOUT, ">>ANFTPSync_StdoutLog";
my @guidvar1=("0","1","2","3"
my @guidvar2=("0","1","2","3"
my @guidvar3=("0","1","2","3"
my @guidvar4=("0","1","2","3"
while (<INPUT>) {
my ($line) = $_;
chomp($line);
$new = substr( $line, index( $_, "\," ) + 1 );
$old = substr( $line, 0, index( $_, "\," ) );
unless(-d "$old/images2") {
exec `mv $old/images $old/images2`;
}
unless(-l "$old/images") {
exec `ln -s $new/images $old/images`;
exec `chmod 555 $new/images`;
}
for($key1=0; $key1<=$#guidvar1; $key1++) {
for($key2=0; $key2<=$#guidvar2; $key2++) {
for($key3=0; $key3<=$#guidvar3; $key3++) {
for($key4=0; $key4<=$#guidvar4; $key4++) {
my $command = "sudo rsync -gopt $old/images2/$guidvar1[$ke
print STDOUT, $command;
exec `$command`;
}
}
}
}
}
close(INPUT);
close(STDERR);
close(STDOUT);
Why not just use xargs to run the command drectly with the longest line available?
ASKER
what I would like to know, truthfully, is if there is a way to determine what that limitation is in the shell, so I don't have to use all of this guesswork. I've tried the find . -name *.jpg -xargs rsync approach, but I still get this error on occasion.
it would be nice to maybe modify the above script, so that it will run with something like 0*.jpg, and if error occurs, then try 00*.jpg, then 000*.jpg, so it wouldn't have to do the pattern matching down four levels every time. is there a way to do this using the above script?
it would be nice to maybe modify the above script, so that it will run with something like 0*.jpg, and if error occurs, then try 00*.jpg, then 000*.jpg, so it wouldn't have to do the pattern matching down four levels every time. is there a way to do this using the above script?
You are using exec with backticks... I don't think this is what you want. To execute a program and capture the output, use backtacks. To execute a program, letting output go to standard output, and not return, use exec. To execute a program, letting output go to standard output, then return, use system. So I'm guessing you want system in these cases.
Next, rsync has filter options you can use to tell it which files to include/exclude in your update. If you do man rsync, you'll get all of the details.
Something like this I think will do what you want (but you should test with the --dry-run option)
system("sudo rsync -gopt --dry-run --exclude=* --include=\\[0-9A-F\\]\\[0 -9A-F\\]\\ [0-9A-F\\] \\[0-9A-F\ \]* $old/images2/ $new/images");
The square brackets need to be escaped so they are passed to rsync, and not processed by the shell
If you are happy with these results, remove the --dry-run option
Next, rsync has filter options you can use to tell it which files to include/exclude in your update. If you do man rsync, you'll get all of the details.
Something like this I think will do what you want (but you should test with the --dry-run option)
system("sudo rsync -gopt --dry-run --exclude=* --include=\\[0-9A-F\\]\\[0
The square brackets need to be escaped so they are passed to rsync, and not processed by the shell
If you are happy with these results, remove the --dry-run option
ASKER
I tried just running this command on the command line:
gerdesk@new-cleaner01:/net home/gerde sk $ rsync -gopt --include=[0-9A-F][0-9A-F] [0-9A-F][0 -9A-F]* /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 46/RP2054/ images /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 45/RP2054/ images
skipping directory /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 46/RP2054/ images
nothing happened. this is without --dry-run, since the images are in my own directory. I took out --exclude=*, because nothing was copied with that in there. the only way it seemed to work is if I removed the --include and put that pattern matching at the end of my old path, and even then, I still get argument list too long.
gerdesk@new-cleaner01:/net
skipping directory /nethome/gerdesk/scripts/p
nothing happened. this is without --dry-run, since the images are in my own directory. I took out --exclude=*, because nothing was copied with that in there. the only way it seemed to work is if I removed the --include and put that pattern matching at the end of my old path, and even then, I still get argument list too long.
You need to escape the square brackets. Try this at the command line (all on one line of course)
rsync -gopt
--exclude=\*
--include=\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\*
/nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images
/nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000045/RP2054/images
ASKER
nothing happens, even with escaped square brackets:
gerdesk@new-cleaner01:/net home/gerde sk $ rsync -gopt --exclude=\* --include=\[0-9A-F\]\[0-9A -F\]\[0-9A -F\]\[0-9A -F\]\* /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 46/RP2054/ images /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 45/RP2054/ images
skipping directory /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 46/RP2054/ images
gerdesk@new-cleaner01:/net home/gerde sk $
it just skips the source directory and that's it.
gerdesk@new-cleaner01:/net
skipping directory /nethome/gerdesk/scripts/p
gerdesk@new-cleaner01:/net
it just skips the source directory and that's it.
A simple way is:
#> find /path/to/your/files -name '*.jpg' -exec mv -f {} /move_files/to_here/ \;
Moves files older than 7 days
#> find /path/to/your/files -name '*.jpg' -mtime +7 -exec mv -f {} /move_files/to_here/ \;
ASKER
I'll keep these solutions in mind, but it appears that there is no easy way around this problem. It would be nice if this would be addressed in future Unix releases :(
I see... it's skipping the directory because it doesn't match. Try adding a slash on the source directory name so it knows to go into that directory:
rsync -gopt --exclude=\* --include=\[0-9A-F\]\[0-9A -F\]\[0-9A -F\]\[0-9A -F\]\* /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 46/RP2054/ images/ /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 45/RP2054/ images
rsync -gopt --exclude=\* --include=\[0-9A-F\]\[0-9A
ASKER
nope, that doesn't work:
gerdesk@new-cleaner02:/net home/gerde sk $ rsync -gopt --exclude=\* --include=\[0-9A-F\]\[0-9A -F\]\[0-9A -F\]\[0-9A -F\]\* /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 46/RP2054/ images/ /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 45/RP2054/ images
skipping directory /nethome/gerdesk/scripts/p erl/datafi les/nfs/ft p/REPUBLIC /RI_400101 2/RP_30000 46/RP2054/ images/.
gerdesk@new-cleaner02:/net
skipping directory /nethome/gerdesk/scripts/p
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thank you! now I can finally close this issue.