Solved

bash: /bin/cat: Argument list too long

Posted on 2010-11-11
17
4,169 Views
Last Modified: 2012-05-10
Hi,

I have a folder with 180,000 documents, I was trying to open them all and write to a single file but received the "Argument list too long". Any way to get around this limitation with "cat"?

Essentially, I want to combine all files and unique all records.

Thank you.
0
Comment
Question by:faithless1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 3
  • +1
17 Comments
 
LVL 12

Expert Comment

by:tel2
ID: 34115938
Hi faithless1,

Does this work for you?

ls | xargs cat >/tmp/allfiles.out

Feel free to ignore errors about ./ & ../ being directories.

Then you can run uniq or sort -u... on the output (or pipe the above through that before redirecting to allfiles.out).
0
 
LVL 12

Assisted Solution

by:tel2
tel2 earned 100 total points
ID: 34115976
Hi again faithless1,

This one allows you to specify wildcards for the types of files you want to process:

find . -name "*.txt" -exec cat '{}' ';' >allfiles.out
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34130963
Here you can allow a series of checks on the contents of files:

$ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 12

Expert Comment

by:tel2
ID: 34133015
Hi mifergie,

How is that going to get around faithless1's problem of having to process 180,000 documents, which gives an "Argument list too long" error when you do things like:
    cat * >txt.out    # Which he was probably trying to do
or
    ls * | ...              # Which you are essentially doing
?

Your solution fails with the same error.
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133066
My post has been deleted for some reason, so I don't know exactly what I said.  It doesn't seem to be in my bash history either...

I don't claim that cat *>txt.out would be successful.  I claim that you can just do this in a for-loop and get around the large argument list.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34133152
Hi mifergie,

I can still see your post (even after a refresh).  Here it is for your reference:
    Here you can allow a series of checks on the contents of files:
    $ for file in `/usr/bin/ls *.sh`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;


The idea of a for loop is OK, but what I'm trying to say is, your "ls *..." will fail just like a "cat *..." will fail, because both of them will expand out the list of filenames, which will blow the limit.
If you don't believe me, run a for loop to "touch" 180,000 files with names like:
    abcdefghijklmnopqrstuvwxyz_0.txt
    abcdefghijklmnopqrstuvwxyz_1.txt
    abcdefghijklmnopqrstuvwxyz_2.txt
    ...etc...
Next, run this:
    cat *.txt >txt1.out
If the cat works, create more files (or files with longer names), until the cat fails.
Next, run your solution (making sure you change your "*.sh" to "*.txt").
Are you with me?
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133221
Gotcha.  Hmmm...
Well, okay, how about dropping the *...

for file in `/usr/bin/ls`; do grep searchtermhere $file; if [ $? -eq 1 ]; then cat $file>>txt.out; fi; done;

if that works on the 180k files, one could easily build a test for file name using grep on the filename.  If I get a chance in the next few minutes I'll provide an example.

0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133275
Here's something that doesn't require any long input list:

for file in `/usr/bin/ls`; do if [ `echo $file | grep '\.sh'` ]; then echo $file; cat $file>>txt.out; fi; done;

so as long as ls can operate on 180k files, this should work.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34133450
Now you're talking, mifergie!

A few notes:
- '/usr/bin/ls' will fail on any system which has 'ls' somewhere else (like the webhost I'm using GNU/Linux, which has it in /bin).
- Running the for loop 180,000 times, and running all those commands inside it, will be a lot slower than the solutions I've given.  Yes, I tested it.
- Your final ';' is unnecessary.


Hi faithless1,

I've just realised that my first solution could be simplified (and sped up) as follows:
    ls | cat >/tmp/allfiles.out
Of course, like my first solution, this doesn't filter by filename (one could use my 'find...' for that), but you haven't said filtering is a requirement.
0
 
LVL 1

Expert Comment

by:mifergie
ID: 34133479
Yup, I was just copying and pasting from cygwin.  On my system ls is aliased to something that has an asterisk at the end of some files (executable, I think), so I have to specify it exactly.  I also know that it will be slower - but it gives a great amount of flexibility for picking certain files based upon user-supplied criteria.

I kind of wondered why you had the xargs in your first solution...  
0
 

Author Comment

by:faithless1
ID: 34146933
Hi,

Thanks for all the responses, very much appreciated. Still having the same issue when running these commands:

ls | xargs cat *  >/tmp/allfiles.out | bash: /usr/bin/xargs: Argument list too long

ls | cat * >/tmp/allfiles.out | bash: /bin/cat: Argument list too long

Filtering isn't a requirement, but helpful to know.

If it isn't possible to do this with standard commands, perhaps I can open first 50K output to output.txt, then open the next 50k and >> to output.txt etc?

Thanks again.
0
 
LVL 10

Expert Comment

by:TRW-Consulting
ID: 34146965
Remove the * in your xargs command and make it:

  ls | xargs cat  >/tmp/allfiles.out
0
 
LVL 10

Expert Comment

by:TRW-Consulting
ID: 34147020
Oh, and don't give me the points if that 'xargs' solution works. They should go to the very first poster, tel2.

Now if that doesn't work, then an alternative is:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out
0
 
LVL 12

Expert Comment

by:tel2
ID: 34149162
Thanks for that, TRW.

Hi faithless1,

My solutions work for me - I have tested them with enough files to simulate your situation.

As TRW has implied, the '*'s you've put in the commands you ran were not in my solutions.

As stated in my last post, the "xargs" is not required (it just slows things down in this case).

So, my recommended solutions are:
    ls | cat >/tmp/allfiles.out   # From my last post
    find . -name "*.txt" -exec cat '{}' ';' >allfiles.out  # From my 2nd post

Enjoy.
0
 

Author Comment

by:faithless1
ID: 34151810
Hi,

Thanks again and apologies for including the *, I think this partly solves the problem. I ran both commands, here are the results:

ls |
  while read filename
  do
    cat $filename
  done >/tmp/allfiles.out

TRW, In this case, there are 180K unique files so I'm not sure how I would execute this on the command line. I tried replacing "filename" with allfiles.out but wasn't successful - I'm pretty sure I'm doing this incorrectly.

Tel2,
I was able to pipe all files to 'allfiles.out' which now includes all 180K files. Is there a way to create 'allfiles.out' so it will have contents of each file from the 180K I have vs just a list of 180K files?

Thanks again.
0
 
LVL 12

Expert Comment

by:tel2
ID: 34151857
Hi faithless1,

Questions:
1. Are you saying that allfiles.out now contains a list of file NAMES, rather than the contents of the files?
2. Please post the exact command you ran to generate allfiles.out.
3. How big is allfiles.out, in bytes and lines (try: wc allfiles.out).
4. Are the 180,000 files, text files, or what?

Thanks.
0
 
LVL 10

Accepted Solution

by:
TRW-Consulting earned 400 total points
ID: 34151870
If you need to do it all on a single command line, use this:

  ls |  while read filename;  do cat $filename;  done >/tmp/allfiles.out

Just copy and paste the line above
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Six Sigma Control Plans
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question