Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 516
  • Last Modified:

Grep pb

I have a directory with a few thousand files. For this reason I use following command to grep them :
ls -1 | xargs grep UPDATE
Unfortunately a few of these files have some very long lines and grep outputs a couple of :
grep : 0652-226 ...
followed by a message saying that the maximal length of a line (2048)  is reached.
I have no idea what file exactly have the problem (as I said I have a few thousands of them), and even worse,  I don't know if it is just a display problem or if I may have missed some of the UPDATE words I am looking for.
Please help me without being to specific about my command which is only an example, I have the problem of the lines too long all the time.
Furthermore, for many uninteresting reasons I am not able to split my files in multiple subdirectories.
Thanks a lot
0
Eric98
Asked:
Eric98
  • 5
  • 3
  • 2
  • +2
1 Solution
 
GP1628Commented:
Unix is multi-user. It gives each user a certain amount of resources including memory. Its not a good idea to work around it really since that many files are not recommended in a single directory. Is ther eany way to break things up into subdirectorys? Such as using the first character of the file name? thats why (and how) alot of ISPs break up their users into /home/g/gandalf

If you REALLY have to do it then try doing it in stages such as
ls -1 0*|grep UPDATE

maybe even do a "for" loop to go 0*, 1*, 2*, 3*

Gandalf  Parker

0
 
jlevieCommented:
A better way, that bypasses the limitations of ls when dealing with large directories is to use find. Something like:

find . -exec grep UPDATE {} \;

is functionally equivalent. It doesn't help with the the line length limitation in grep that you are running into.

Since I can't tell from your question exactly what you need to know (only what files contain "UPDATE"" or the line from each file that contains "UPDATE"?) it's hard to say how to solve that part of the problem. A shell script, executed from find, could determine if the keyword in in the file by using the output of "strings" on the file piped into grep, with the file name echoed if the string was found. A "script-file something like:

#!/bin/sh
strings $1 | grep UPDATE >/dev/null
if [ $? = 0 ]; then
  echo $1
fi

Used with "find . -exec script-file {} \;" would do it.
0
 
Eric98Author Commented:
Adjusted points from 100 to 150
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Eric98Author Commented:
I admit my question is not exactly what one could call clear. As jlevie suggests, I need to know the lines (that means the content, not the line number) from each files that contains UPDATE.

My basic need is : from my directory with all that files, I want to build a file with all lines from all files containing the word UPDATE.
And I want to be sure not to miss a single one, which means that in my new file there must also be the lines with more then 2048 caracters and containing UPDATE, or at least I want to know what lines - if any - have not been checked because they were too long.

And as I said already, there's no way to break into multiple subdirectories, nor do I want to break all my commands in multiple sub-commands as suggested by GB1628.

Thanks to both of you, I realize this is a hard question, and I increase the points
0
 
GP1628Commented:
According to the grep man page......
Lines are limited to  BUFSIZ  characters;  longer  lines  are  truncated.  BUFSIZ  is defined in     <stdio.h>.

You could try changing the stdio.h and recompiling grep then change it back. However there may have been a good reason for the limit :)
Using sed as a filter seems to have the same limit.

Even if we get past that... The other problem is that if you try to convert "*" to a list of files then you will hit a maximum. THe workaround would be to read all the file names into a list, then process the list one at a time with something like a "for" loop. Even thats going to be hard to work out with so many files.

Gandalf  Parker
0
 
jlevieCommented:
Now that I know what you need to do I can suggest an alternative that will work. I've done something very similar in the past with a perl script and for similar reasons. I doubt that I've still got the script, but I could whip up a new one in short order. Do you have perl installed?
0
 
ozoCommented:
perl -ne 'print if /UPDATE/'
does not have a BUFSIZ limit
0
 
jlevieCommented:
Well I did still have the code, it's below...

---snip, snip---begin dir-search---
#!/usr/bin/perl
#
# NAME
#      dir-search - Searches big dirs
#
# SYNTAX
#      dir-search string
#
# DESCRIPTION
#       The current directory is read to get the list of  files (skipping any sub-dirs)
#       and each file is scanned for lines containing the specified string. There's
#       some limit on the number of files that can be in the directory, but it's
#       known to work with about 10,000 files. Likewise there's probably some
#       limit on the max line size, but it's been observed to work properly with files
#       containing more than 8kb per line.
#      
#
# Author; Jim Levie
#
if ($#ARGV < 0)
{
    die "Useage: dir-search string\n";
}
$str = $ARGV[0];
opendir(DIR, '.') || die "Can't read current dir";
@files = readdir(DIR);
closedir(DIR);

for (@files)
{
  next if $_ eq '.';
  next if $_ eq '..';
  next if (-d $_);
 
  $file = $_;
  open(INP, $file) || die "Can't open <$file>\n";
  while(<INP>)
  {
    if(/$str/) { print $_; }
  }
  close($file);
}
0
 
mapcCommented:
you're using AIX.. poor man.
aix grep and shell-utils have those limitations.
Either use perl as proposed, or, install GNU grep.
0
 
Eric98Author Commented:
Great, both your solutions (jlevie and ozo) work very well and do exactly what I want.
This raises for me once more the (not that bad) problem to decide which of you guys should get the points.
As I do not have a man for the perl command on my computer, please give me one more information and I'll give the points to both of you.
The information is how to modify -either the perl command line or the script - in order to perform the same search with case insensivity. In other words, what would be the equivalent of grep -i ?
I suggest the first of you - jlevie or ozo - posts this information as an answer, the second one will get his points through a 'for ozo' or 'for jlevie' thread.
Again, I really want to thank you a lot for your help.
Eric
0
 
Eric98Author Commented:
Great, both your solutions (jlevie and ozo) work very well and do exactly what I want.
This raises for me once more the (not that bad) problem to decide which of you guys should get the points.
As I do not have a man for the perl command on my computer, please give me one more information and I'll give the points to both of you.
The information is how to modify -either the perl command line or the script - in order to perform the same search with case insensivity. In other words, what would be the equivalent of grep -i ?
I suggest the first of you - jlevie or ozo - posts this information as an answer, the second one will get his points through a 'for ozo' or 'for jlevie' thread.
Again, I really want to thank you a lot for your help.
Eric
0
 
Eric98Author Commented:
Just found the solution to my last little question myself :
ls -1 | xargs perl -ne 'print if /UPDATE/i'
So I give the points to ozo, and add a thread tfor jlevie.
Thanks
0
 
ozoCommented:
perldoc perlre
perldoc perlrun
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 5
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now