• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 5012
  • Last Modified:

bash: /bin/cat: Argument list too long

Hi,

I have a folder with 180,000 documents, I was trying to open them all and write to a single file but received the "Argument list too long". Any way to get around this limitation with "cat"?

Essentially, I want to combine all files and unique all records.

Thank you.
0
faithless1
Asked:
faithless1
  • 7
  • 5
  • 3
  • +1
2 Solutions
 
tel2Commented:
Hi faithless1,

Does this work for you?

ls | xargs cat >/tmp/allfiles.out

Feel free to ignore errors about ./ & ../ being directories.

Then you can run uniq or sort -u... on the output (or pipe the above through that before redirecting to allfiles.out).
0
 
tel2Commented:
Hi again faithless1,

This one allows you to specify wildcards for the types of files you want to process:

find . -name "*.txt" -exec cat '{}' ';' >allfiles.out
0
 
mifergieCommented:
Here you can allow a series of checks on the contents of files:

$ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
tel2Commented:
Hi mifergie,

How is that going to get around faithless1's problem of having to process 180,000 documents, which gives an "Argument list too long" error when you do things like:
    cat * >txt.out    # Which he was probably trying to do
or
    ls * | ...              # Which you are essentially doing
?

Your solution fails with the same error.
0
 
mifergieCommented:
My post has been deleted for some reason, so I don't know exactly what I said.  It doesn't seem to be in my bash history either...

I don't claim that cat *>txt.out would be successful.  I claim that you can just do this in a for-loop and get around the large argument list.
0
 
tel2Commented:
Hi mifergie,

I can still see your post (even after a refresh).  Here it is for your reference:
    Here you can allow a series of checks on the contents of files:
    $ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;


The idea of a for loop is OK, but what I'm trying to say is, your "ls *..." will fail just like a "cat *..." will fail, because both of them will expand out the list of filenames, which will blow the limit.
If you don't believe me, run a for loop to "touch" 180,000 files with names like:
    abcdefghijklmnopqrstuvwxyz_0.txt
    abcdefghijklmnopqrstuvwxyz_1.txt
    abcdefghijklmnopqrstuvwxyz_2.txt
    ...etc...
Next, run this:
    cat *.txt >txt1.out
If the cat works, create more files (or files with longer names), until the cat fails.
Next, run your solution (making sure you change your "*.sh" to "*.txt").
Are you with me?
0
 
mifergieCommented:
Gotcha.  Hmmm...
Well, okay, how about dropping the *...

for file in `/usr/bin/ls`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;

if that works on the 180k files, one could easily build a test for file name using grep on the filename.  If I get a chance in the next few minutes I'll provide an example.

0
 
mifergieCommented:
Here's something that doesn't require any long input list:

for file in `/usr/bin/ls`; do if [ `echo $file | grep '\.sh'` ]; then echo $file; cat $file>>txt.out; fi; done;

so as long as ls can operate on 180k files, this should work.
0
 
tel2Commented:
Now you're talking, mifergie!

A few notes:
- '/usr/bin/ls' will fail on any system which has 'ls' somewhere else (like the webhost I'm using GNU/Linux, which has it in /bin).
- Running the for loop 180,000 times, and running all those commands inside it, will be a lot slower than the solutions I've given.  Yes, I tested it.
- Your final ';' is unnecessary.


Hi faithless1,

I've just realised that my first solution could be simplified (and sped up) as follows:
    ls | cat >/tmp/allfiles.out
Of course, like my first solution, this doesn't filter by filename (one could use my 'find...' for that), but you haven't said filtering is a requirement.
0
 
mifergieCommented:
Yup, I was just copying and pasting from cygwin.  On my system ls is aliased to something that has an asterisk at the end of some files (executable, I think), so I have to specify it exactly.  I also know that it will be slower - but it gives a great amount of flexibility for picking certain files based upon user-supplied criteria.

I kind of wondered why you had the xargs in your first solution...  
0
 
faithless1Author Commented:
Hi,

Thanks for all the responses, very much appreciated. Still having the same issue when running these commands:

ls | xargs cat *  >/tmp/allfiles.out | bash: /usr/bin/xargs: Argument list too long

ls | cat * >/tmp/allfiles.out | bash: /bin/cat: Argument list too long

Filtering isn't a requirement, but helpful to know.

If it isn't possible to do this with standard commands, perhaps I can open first 50K output to output.txt, then open the next 50k and >> to output.txt etc?

Thanks again.
0
 
TRW-ConsultingCommented:
Remove the * in your xargs command and make it:

  ls | xargs cat  >/tmp/allfiles.out
0
 
TRW-ConsultingCommented:
Oh, and don't give me the points if that 'xargs' solution works. They should go to the very first poster, tel2.

Now if that doesn't work, then an alternative is:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out
0
 
tel2Commented:
Thanks for that, TRW.

Hi faithless1,

My solutions work for me - I have tested them with enough files to simulate your situation.

As TRW has implied, the '*'s you've put in the commands you ran were not in my solutions.

As stated in my last post, the "xargs" is not required (it just slows things down in this case).

So, my recommended solutions are:
    ls | cat >/tmp/allfiles.out   # From my last post
    find . -name "*.txt" -exec cat '{}' ';' >allfiles.out  # From my 2nd post

Enjoy.
0
 
faithless1Author Commented:
Hi,

Thanks again and apologies for including the *, I think this partly solves the problem. I ran both commands, here are the results:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out

TRW, In this case, there are 180K unique files so I'm not sure how I would execute this on the command line. I tried replacing "filename" with allfiles.out but wasn't successful - I'm pretty sure I'm doing this incorrectly.

Tel2,
I was able to pipe all files to 'allfiles.out' which now includes all 180K files. Is there a way to create 'allfiles.out' so it will have contents of each file from the 180K I have vs just a list of 180K files?

Thanks again.
0
 
tel2Commented:
Hi faithless1,

Questions:
1. Are you saying that allfiles.out now contains a list of file NAMES, rather than the contents of the files?
2. Please post the exact command you ran to generate allfiles.out.
3. How big is allfiles.out, in bytes and lines (try: wc allfiles.out).
4. Are the 180,000 files, text files, or what?

Thanks.
0
 
TRW-ConsultingCommented:
If you need to do it all on a single command line, use this:

  ls |  while read filename;  do cat $filename;  done >/tmp/allfiles.out

Just copy and paste the line above
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 7
  • 5
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now