Solved

bash: /bin/cat: Argument list too long

Posted on 2010-11-11
17
3,626 Views
Last Modified: 2012-05-10
Hi,

I have a folder with 180,000 documents, I was trying to open them all and write to a single file but received the "Argument list too long". Any way to get around this limitation with "cat"?

Essentially, I want to combine all files and unique all records.

Thank you.
0
Comment
Question by:faithless1
  • 7
  • 5
  • 3
  • +1
17 Comments
 
LVL 12

Expert Comment

by:tel2
ID: 34115938
Hi faithless1,

Does this work for you?

ls | xargs cat >/tmp/allfiles.out

Feel free to ignore errors about ./ & ../ being directories.

Then you can run uniq or sort -u... on the output (or pipe the above through that before redirecting to allfiles.out).
0
 
LVL 12

Assisted Solution

by:tel2
tel2 earned 100 total points
ID: 34115976
Hi again faithless1,

This one allows you to specify wildcards for the types of files you want to process:

find . -name "*.txt" -exec cat '{}' ';' >allfiles.out
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34130963
Here you can allow a series of checks on the contents of files:

$ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;
0
 
LVL 12

Expert Comment

by:tel2
ID: 34133015
Hi mifergie,

How is that going to get around faithless1's problem of having to process 180,000 documents, which gives an "Argument list too long" error when you do things like:
    cat * >txt.out    # Which he was probably trying to do
or
    ls * | ...              # Which you are essentially doing
?

Your solution fails with the same error.
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133066
My post has been deleted for some reason, so I don't know exactly what I said.  It doesn't seem to be in my bash history either...

I don't claim that cat *>txt.out would be successful.  I claim that you can just do this in a for-loop and get around the large argument list.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34133152
Hi mifergie,

I can still see your post (even after a refresh).  Here it is for your reference:
    Here you can allow a series of checks on the contents of files:
    $ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;


The idea of a for loop is OK, but what I'm trying to say is, your "ls *..." will fail just like a "cat *..." will fail, because both of them will expand out the list of filenames, which will blow the limit.
If you don't believe me, run a for loop to "touch" 180,000 files with names like:
    abcdefghijklmnopqrstuvwxyz_0.txt
    abcdefghijklmnopqrstuvwxyz_1.txt
    abcdefghijklmnopqrstuvwxyz_2.txt
    ...etc...
Next, run this:
    cat *.txt >txt1.out
If the cat works, create more files (or files with longer names), until the cat fails.
Next, run your solution (making sure you change your "*.sh" to "*.txt").
Are you with me?
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133221
Gotcha.  Hmmm...
Well, okay, how about dropping the *...

for file in `/usr/bin/ls`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;

if that works on the 180k files, one could easily build a test for file name using grep on the filename.  If I get a chance in the next few minutes I'll provide an example.

0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133275
Here's something that doesn't require any long input list:

for file in `/usr/bin/ls`; do if [ `echo $file | grep '\.sh'` ]; then echo $file; cat $file>>txt.out; fi; done;

so as long as ls can operate on 180k files, this should work.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 12

Expert Comment

by:tel2
ID: 34133450
Now you're talking, mifergie!

A few notes:
- '/usr/bin/ls' will fail on any system which has 'ls' somewhere else (like the webhost I'm using GNU/Linux, which has it in /bin).
- Running the for loop 180,000 times, and running all those commands inside it, will be a lot slower than the solutions I've given.  Yes, I tested it.
- Your final ';' is unnecessary.


Hi faithless1,

I've just realised that my first solution could be simplified (and sped up) as follows:
    ls | cat >/tmp/allfiles.out
Of course, like my first solution, this doesn't filter by filename (one could use my 'find...' for that), but you haven't said filtering is a requirement.
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133479
Yup, I was just copying and pasting from cygwin.  On my system ls is aliased to something that has an asterisk at the end of some files (executable, I think), so I have to specify it exactly.  I also know that it will be slower - but it gives a great amount of flexibility for picking certain files based upon user-supplied criteria.

I kind of wondered why you had the xargs in your first solution...  
0
 

Author Comment

by:faithless1
ID: 34146933
Hi,

Thanks for all the responses, very much appreciated. Still having the same issue when running these commands:

ls | xargs cat *  >/tmp/allfiles.out | bash: /usr/bin/xargs: Argument list too long

ls | cat * >/tmp/allfiles.out | bash: /bin/cat: Argument list too long

Filtering isn't a requirement, but helpful to know.

If it isn't possible to do this with standard commands, perhaps I can open first 50K output to output.txt, then open the next 50k and >> to output.txt etc?

Thanks again.
0
 
LVL 10

Expert Comment

by:TRW-Consulting
ID: 34146965
Remove the * in your xargs command and make it:

  ls | xargs cat  >/tmp/allfiles.out
0
 
LVL 10

Expert Comment

by:TRW-Consulting
ID: 34147020
Oh, and don't give me the points if that 'xargs' solution works. They should go to the very first poster, tel2.

Now if that doesn't work, then an alternative is:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out
0
 
LVL 12

Expert Comment

by:tel2
ID: 34149162
Thanks for that, TRW.

Hi faithless1,

My solutions work for me - I have tested them with enough files to simulate your situation.

As TRW has implied, the '*'s you've put in the commands you ran were not in my solutions.

As stated in my last post, the "xargs" is not required (it just slows things down in this case).

So, my recommended solutions are:
    ls | cat >/tmp/allfiles.out   # From my last post
    find . -name "*.txt" -exec cat '{}' ';' >allfiles.out  # From my 2nd post

Enjoy.
0
 

Author Comment

by:faithless1
ID: 34151810
Hi,

Thanks again and apologies for including the *, I think this partly solves the problem. I ran both commands, here are the results:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out

TRW, In this case, there are 180K unique files so I'm not sure how I would execute this on the command line. I tried replacing "filename" with allfiles.out but wasn't successful - I'm pretty sure I'm doing this incorrectly.

Tel2,
I was able to pipe all files to 'allfiles.out' which now includes all 180K files. Is there a way to create 'allfiles.out' so it will have contents of each file from the 180K I have vs just a list of 180K files?

Thanks again.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34151857
Hi faithless1,

Questions:
1. Are you saying that allfiles.out now contains a list of file NAMES, rather than the contents of the files?
2. Please post the exact command you ran to generate allfiles.out.
3. How big is allfiles.out, in bytes and lines (try: wc allfiles.out).
4. Are the 180,000 files, text files, or what?

Thanks.
0
 
LVL 10

Accepted Solution

by:
TRW-Consulting earned 400 total points
ID: 34151870
If you need to do it all on a single command line, use this:

  ls |  while read filename;  do cat $filename;  done >/tmp/allfiles.out

Just copy and paste the line above
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now