Solved

getting around the rsync/cp/ls Argument list too long limitation

Posted on 2008-10-22
12
1,708 Views
Last Modified: 2012-05-05
I've been using a perl script that allows me to run an rsync command to copy jpg image files from one location to another, but it takes freakin' forever due to the pattern matching.  I have to do this because of the "Argument list too long" error that happens when you try to copy a large group of files at one time.

Does anyone know if there is a better way to get around this problem with perl, or even a shell script?  I was thinking of using the File::Copy module, but I'd probably run into the same problem.  Here is the code that I currently use:

#!/usr/bin/perl

open(INPUT, "<ANFTP_Paths.txt");
open STDERR, ">>ANFTPSync_ErrorLog";
open STDOUT, ">>ANFTPSync_StdoutLog";
my @guidvar1=("0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F");
my @guidvar2=("0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F");
my @guidvar3=("0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F");
my @guidvar4=("0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F");

while (<INPUT>) {
    my ($line) = $_;

    chomp($line);

    $new = substr( $line, index( $_, "\," ) + 1 );
    $old = substr( $line, 0, index( $_, "\," ) );

    unless(-d "$old/images2") {
       exec `mv $old/images $old/images2`;
    }
    unless(-l "$old/images") {
       exec `ln -s $new/images $old/images`;
       exec `chmod 555 $new/images`;
    }

    for($key1=0; $key1<=$#guidvar1; $key1++) {
       for($key2=0; $key2<=$#guidvar2; $key2++) {
          for($key3=0; $key3<=$#guidvar3; $key3++) {
             for($key4=0; $key4<=$#guidvar4; $key4++) {
                my $command = "sudo rsync -gopt $old/images2/$guidvar1[$key1]$guidvar2[$key2]$guidvar3[$key3]$guidvar4[$key4]* $new/images\n";
                print STDOUT, $command;
                exec `$command`;
             }
          }
       }
    }
}

close(INPUT);
close(STDERR);
close(STDOUT);

0
Comment
Question by:texasreddog
12 Comments
 
LVL 14

Expert Comment

by:sjm_ee
ID: 22777978
Why not just use xargs to run the command drectly with the longest line available?
0
 

Author Comment

by:texasreddog
ID: 22778052
what I would like to know, truthfully, is if there is a way to determine what that limitation is in the shell, so I don't have to use all of this guesswork.  I've tried the find . -name *.jpg -xargs rsync approach, but I still get this error on occasion.

it would be nice to maybe modify the above script, so that it will run with something like 0*.jpg, and if error occurs, then try 00*.jpg, then 000*.jpg, so it wouldn't have to do the pattern matching down four levels every time.  is there a way to do this using the above script?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22779172
You are using exec with backticks... I don't think this is what you want.  To execute a program and capture the output, use backtacks.  To execute a program, letting output go to standard output, and not return, use exec.  To execute a program, letting output go to standard output, then return, use system.  So I'm guessing you want system in these cases.

Next, rsync has filter options you can use to tell it which files to include/exclude in your update.  If you do man rsync, you'll get all of the details.

Something like this I think will do what you want (but you should test with the --dry-run option)
system("sudo rsync -gopt --dry-run --exclude=* --include=\\[0-9A-F\\]\\[0-9A-F\\]\\[0-9A-F\\]\\[0-9A-F\\]* $old/images2/ $new/images");

The square brackets need to be escaped so they are passed to rsync, and not processed by the shell
If you are happy with these results, remove the --dry-run option

0
 

Author Comment

by:texasreddog
ID: 22779362
I tried just running this command on the command line:

gerdesk@new-cleaner01:/nethome/gerdesk $ rsync -gopt --include=[0-9A-F][0-9A-F][0-9A-F][0-9A-F]* /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000045/RP2054/images
skipping directory /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images

nothing happened.  this is without --dry-run, since the images are in my own directory.  I took out --exclude=*, because nothing was copied with that in there.  the only way it seemed to work is if I removed the --include and put that pattern matching at the end of my old path, and even then, I still get argument list too long.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22779597
You need to escape the square brackets.  Try this at the command line (all on one line of course)
rsync -gopt

  --exclude=\*

  --include=\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\*

  /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images

  /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000045/RP2054/images

Open in new window

0
 

Author Comment

by:texasreddog
ID: 22779648
nothing happens, even with escaped square brackets:

gerdesk@new-cleaner01:/nethome/gerdesk $ rsync -gopt --exclude=\* --include=\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\* /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000045/RP2054/images
skipping directory /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images
gerdesk@new-cleaner01:/nethome/gerdesk $

it just skips the source directory and that's it.
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 6

Expert Comment

by:peter991
ID: 22785798
A simple way is:


#> find /path/to/your/files -name '*.jpg' -exec mv -f {} /move_files/to_here/ \;
 

Moves files older than 7 days

#> find /path/to/your/files -name '*.jpg' -mtime +7 -exec mv -f {} /move_files/to_here/ \;

Open in new window

0
 

Author Comment

by:texasreddog
ID: 22786117
I'll keep these solutions in mind, but it appears that there is no easy way around this problem.  It would be nice if this would be addressed in future Unix releases :(
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22786546
I see... it's skipping the directory because it doesn't match.  Try adding a slash on the source directory name so it knows to go into that directory:
rsync -gopt --exclude=\* --include=\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\* /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images/ /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000045/RP2054/images
0
 

Author Comment

by:texasreddog
ID: 22813243
nope, that doesn't work:

gerdesk@new-cleaner02:/nethome/gerdesk $ rsync -gopt --exclude=\* --include=\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\[0-9A-F\]\* /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images/ /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000045/RP2054/images
skipping directory /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images/.
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 225 total points
ID: 22815734
There were two reasons why this didn't work:
1) The recursive option was not specified, meaning rsync only syncronized the directory name itself, not it's contents
2) With the filters, rsync acts on the first matching filter, not the last matching pattern (as I had thought when I gave my previous posts)

So, making changes for these two issues...

I created a few test files matching and not matching the pattern, and this works as desired.
rsync -goptr --include=[0-9A-F][0-9A-F][0-9A-F][0-9A-F]* --exclude=* /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000046/RP2054/images/ /nethome/gerdesk/scripts/perl/datafiles/nfs/ftp/REPUBLIC/RI_4001012/RP_3000045/RP2054/images

Open in new window

0
 

Author Closing Comment

by:texasreddog
ID: 31508772
thank you!  now I can finally close this issue.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

I promised to write further about my project, and here I am.  First, I needed to setup the Primary Server.  You can read how in this article: Setup FreeBSD Server with full HDD encryption (http://www.experts-exchange.com/OS/Unix/BSD/FreeBSD/A_3660-S…
Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now