Solved

bash: /bin/cat: Argument list too long

Posted on 2010-11-11
17
3,832 Views
Last Modified: 2012-05-10
Hi,

I have a folder with 180,000 documents, I was trying to open them all and write to a single file but received the "Argument list too long". Any way to get around this limitation with "cat"?

Essentially, I want to combine all files and unique all records.

Thank you.
0
Comment
Question by:faithless1
  • 7
  • 5
  • 3
  • +1
17 Comments
 
LVL 12

Expert Comment

by:tel2
ID: 34115938
Hi faithless1,

Does this work for you?

ls | xargs cat >/tmp/allfiles.out

Feel free to ignore errors about ./ & ../ being directories.

Then you can run uniq or sort -u... on the output (or pipe the above through that before redirecting to allfiles.out).
0
 
LVL 12

Assisted Solution

by:tel2
tel2 earned 100 total points
ID: 34115976
Hi again faithless1,

This one allows you to specify wildcards for the types of files you want to process:

find . -name "*.txt" -exec cat '{}' ';' >allfiles.out
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34130963
Here you can allow a series of checks on the contents of files:

$ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;
0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 
LVL 12

Expert Comment

by:tel2
ID: 34133015
Hi mifergie,

How is that going to get around faithless1's problem of having to process 180,000 documents, which gives an "Argument list too long" error when you do things like:
    cat * >txt.out    # Which he was probably trying to do
or
    ls * | ...              # Which you are essentially doing
?

Your solution fails with the same error.
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133066
My post has been deleted for some reason, so I don't know exactly what I said.  It doesn't seem to be in my bash history either...

I don't claim that cat *>txt.out would be successful.  I claim that you can just do this in a for-loop and get around the large argument list.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34133152
Hi mifergie,

I can still see your post (even after a refresh).  Here it is for your reference:
    Here you can allow a series of checks on the contents of files:
    $ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;


The idea of a for loop is OK, but what I'm trying to say is, your "ls *..." will fail just like a "cat *..." will fail, because both of them will expand out the list of filenames, which will blow the limit.
If you don't believe me, run a for loop to "touch" 180,000 files with names like:
    abcdefghijklmnopqrstuvwxyz_0.txt
    abcdefghijklmnopqrstuvwxyz_1.txt
    abcdefghijklmnopqrstuvwxyz_2.txt
    ...etc...
Next, run this:
    cat *.txt >txt1.out
If the cat works, create more files (or files with longer names), until the cat fails.
Next, run your solution (making sure you change your "*.sh" to "*.txt").
Are you with me?
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133221
Gotcha.  Hmmm...
Well, okay, how about dropping the *...

for file in `/usr/bin/ls`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;

if that works on the 180k files, one could easily build a test for file name using grep on the filename.  If I get a chance in the next few minutes I'll provide an example.

0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133275
Here's something that doesn't require any long input list:

for file in `/usr/bin/ls`; do if [ `echo $file | grep '\.sh'` ]; then echo $file; cat $file>>txt.out; fi; done;

so as long as ls can operate on 180k files, this should work.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34133450
Now you're talking, mifergie!

A few notes:
- '/usr/bin/ls' will fail on any system which has 'ls' somewhere else (like the webhost I'm using GNU/Linux, which has it in /bin).
- Running the for loop 180,000 times, and running all those commands inside it, will be a lot slower than the solutions I've given.  Yes, I tested it.
- Your final ';' is unnecessary.


Hi faithless1,

I've just realised that my first solution could be simplified (and sped up) as follows:
    ls | cat >/tmp/allfiles.out
Of course, like my first solution, this doesn't filter by filename (one could use my 'find...' for that), but you haven't said filtering is a requirement.
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133479
Yup, I was just copying and pasting from cygwin.  On my system ls is aliased to something that has an asterisk at the end of some files (executable, I think), so I have to specify it exactly.  I also know that it will be slower - but it gives a great amount of flexibility for picking certain files based upon user-supplied criteria.

I kind of wondered why you had the xargs in your first solution...  
0
 

Author Comment

by:faithless1
ID: 34146933
Hi,

Thanks for all the responses, very much appreciated. Still having the same issue when running these commands:

ls | xargs cat *  >/tmp/allfiles.out | bash: /usr/bin/xargs: Argument list too long

ls | cat * >/tmp/allfiles.out | bash: /bin/cat: Argument list too long

Filtering isn't a requirement, but helpful to know.

If it isn't possible to do this with standard commands, perhaps I can open first 50K output to output.txt, then open the next 50k and >> to output.txt etc?

Thanks again.
0
 
LVL 10

Expert Comment

by:TRW-Consulting
ID: 34146965
Remove the * in your xargs command and make it:

  ls | xargs cat  >/tmp/allfiles.out
0
 
LVL 10

Expert Comment

by:TRW-Consulting
ID: 34147020
Oh, and don't give me the points if that 'xargs' solution works. They should go to the very first poster, tel2.

Now if that doesn't work, then an alternative is:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out
0
 
LVL 12

Expert Comment

by:tel2
ID: 34149162
Thanks for that, TRW.

Hi faithless1,

My solutions work for me - I have tested them with enough files to simulate your situation.

As TRW has implied, the '*'s you've put in the commands you ran were not in my solutions.

As stated in my last post, the "xargs" is not required (it just slows things down in this case).

So, my recommended solutions are:
    ls | cat >/tmp/allfiles.out   # From my last post
    find . -name "*.txt" -exec cat '{}' ';' >allfiles.out  # From my 2nd post

Enjoy.
0
 

Author Comment

by:faithless1
ID: 34151810
Hi,

Thanks again and apologies for including the *, I think this partly solves the problem. I ran both commands, here are the results:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out

TRW, In this case, there are 180K unique files so I'm not sure how I would execute this on the command line. I tried replacing "filename" with allfiles.out but wasn't successful - I'm pretty sure I'm doing this incorrectly.

Tel2,
I was able to pipe all files to 'allfiles.out' which now includes all 180K files. Is there a way to create 'allfiles.out' so it will have contents of each file from the 180K I have vs just a list of 180K files?

Thanks again.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34151857
Hi faithless1,

Questions:
1. Are you saying that allfiles.out now contains a list of file NAMES, rather than the contents of the files?
2. Please post the exact command you ran to generate allfiles.out.
3. How big is allfiles.out, in bytes and lines (try: wc allfiles.out).
4. Are the 180,000 files, text files, or what?

Thanks.
0
 
LVL 10

Accepted Solution

by:
TRW-Consulting earned 400 total points
ID: 34151870
If you need to do it all on a single command line, use this:

  ls |  while read filename;  do cat $filename;  done >/tmp/allfiles.out

Just copy and paste the line above
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Unix/bash: scripted arithmetic 13 101
remove spaces from a file name linux 3 65
IE 11 + long running scripts 3 73
AWK How do I reorder the columns in a csv 7 45
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Utilizing an array to gracefully append to a list of EmailAddresses
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question