Solved

Grep pb

Posted on 2000-03-24
13
475 Views
Last Modified: 2010-04-21
I have a directory with a few thousand files. For this reason I use following command to grep them :
ls -1 | xargs grep UPDATE
Unfortunately a few of these files have some very long lines and grep outputs a couple of :
grep : 0652-226 ...
followed by a message saying that the maximal length of a line (2048)  is reached.
I have no idea what file exactly have the problem (as I said I have a few thousands of them), and even worse,  I don't know if it is just a display problem or if I may have missed some of the UPDATE words I am looking for.
Please help me without being to specific about my command which is only an example, I have the problem of the lines too long all the time.
Furthermore, for many uninteresting reasons I am not able to split my files in multiple subdirectories.
Thanks a lot
0
Comment
Question by:Eric98
  • 5
  • 3
  • 2
  • +2
13 Comments
 
LVL 2

Expert Comment

by:GP1628
ID: 2655555
Unix is multi-user. It gives each user a certain amount of resources including memory. Its not a good idea to work around it really since that many files are not recommended in a single directory. Is ther eany way to break things up into subdirectorys? Such as using the first character of the file name? thats why (and how) alot of ISPs break up their users into /home/g/gandalf

If you REALLY have to do it then try doing it in stages such as
ls -1 0*|grep UPDATE

maybe even do a "for" loop to go 0*, 1*, 2*, 3*

Gandalf  Parker

0
 
LVL 40

Expert Comment

by:jlevie
ID: 2656133
A better way, that bypasses the limitations of ls when dealing with large directories is to use find. Something like:

find . -exec grep UPDATE {} \;

is functionally equivalent. It doesn't help with the the line length limitation in grep that you are running into.

Since I can't tell from your question exactly what you need to know (only what files contain "UPDATE"" or the line from each file that contains "UPDATE"?) it's hard to say how to solve that part of the problem. A shell script, executed from find, could determine if the keyword in in the file by using the output of "strings" on the file piped into grep, with the file name echoed if the string was found. A "script-file something like:

#!/bin/sh
strings $1 | grep UPDATE >/dev/null
if [ $? = 0 ]; then
  echo $1
fi

Used with "find . -exec script-file {} \;" would do it.
0
 

Author Comment

by:Eric98
ID: 2656428
Adjusted points from 100 to 150
0
 

Author Comment

by:Eric98
ID: 2656429
I admit my question is not exactly what one could call clear. As jlevie suggests, I need to know the lines (that means the content, not the line number) from each files that contains UPDATE.

My basic need is : from my directory with all that files, I want to build a file with all lines from all files containing the word UPDATE.
And I want to be sure not to miss a single one, which means that in my new file there must also be the lines with more then 2048 caracters and containing UPDATE, or at least I want to know what lines - if any - have not been checked because they were too long.

And as I said already, there's no way to break into multiple subdirectories, nor do I want to break all my commands in multiple sub-commands as suggested by GB1628.

Thanks to both of you, I realize this is a hard question, and I increase the points
0
 
LVL 2

Expert Comment

by:GP1628
ID: 2656692
According to the grep man page......
Lines are limited to  BUFSIZ  characters;  longer  lines  are  truncated.  BUFSIZ  is defined in     <stdio.h>.

You could try changing the stdio.h and recompiling grep then change it back. However there may have been a good reason for the limit :)
Using sed as a filter seems to have the same limit.

Even if we get past that... The other problem is that if you try to convert "*" to a list of files then you will hit a maximum. THe workaround would be to read all the file names into a list, then process the list one at a time with something like a "for" loop. Even thats going to be hard to work out with so many files.

Gandalf  Parker
0
 
LVL 40

Expert Comment

by:jlevie
ID: 2656740
Now that I know what you need to do I can suggest an alternative that will work. I've done something very similar in the past with a perl script and for similar reasons. I doubt that I've still got the script, but I could whip up a new one in short order. Do you have perl installed?
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 84

Accepted Solution

by:
ozo earned 150 total points
ID: 2657043
perl -ne 'print if /UPDATE/'
does not have a BUFSIZ limit
0
 
LVL 40

Expert Comment

by:jlevie
ID: 2657180
Well I did still have the code, it's below...

---snip, snip---begin dir-search---
#!/usr/bin/perl
#
# NAME
#      dir-search - Searches big dirs
#
# SYNTAX
#      dir-search string
#
# DESCRIPTION
#       The current directory is read to get the list of  files (skipping any sub-dirs)
#       and each file is scanned for lines containing the specified string. There's
#       some limit on the number of files that can be in the directory, but it's
#       known to work with about 10,000 files. Likewise there's probably some
#       limit on the max line size, but it's been observed to work properly with files
#       containing more than 8kb per line.
#      
#
# Author; Jim Levie
#
if ($#ARGV < 0)
{
    die "Useage: dir-search string\n";
}
$str = $ARGV[0];
opendir(DIR, '.') || die "Can't read current dir";
@files = readdir(DIR);
closedir(DIR);

for (@files)
{
  next if $_ eq '.';
  next if $_ eq '..';
  next if (-d $_);
 
  $file = $_;
  open(INP, $file) || die "Can't open <$file>\n";
  while(<INP>)
  {
    if(/$str/) { print $_; }
  }
  close($file);
}
0
 
LVL 2

Expert Comment

by:mapc
ID: 2658770
you're using AIX.. poor man.
aix grep and shell-utils have those limitations.
Either use perl as proposed, or, install GNU grep.
0
 

Author Comment

by:Eric98
ID: 2659939
Great, both your solutions (jlevie and ozo) work very well and do exactly what I want.
This raises for me once more the (not that bad) problem to decide which of you guys should get the points.
As I do not have a man for the perl command on my computer, please give me one more information and I'll give the points to both of you.
The information is how to modify -either the perl command line or the script - in order to perform the same search with case insensivity. In other words, what would be the equivalent of grep -i ?
I suggest the first of you - jlevie or ozo - posts this information as an answer, the second one will get his points through a 'for ozo' or 'for jlevie' thread.
Again, I really want to thank you a lot for your help.
Eric
0
 

Author Comment

by:Eric98
ID: 2659940
Great, both your solutions (jlevie and ozo) work very well and do exactly what I want.
This raises for me once more the (not that bad) problem to decide which of you guys should get the points.
As I do not have a man for the perl command on my computer, please give me one more information and I'll give the points to both of you.
The information is how to modify -either the perl command line or the script - in order to perform the same search with case insensivity. In other words, what would be the equivalent of grep -i ?
I suggest the first of you - jlevie or ozo - posts this information as an answer, the second one will get his points through a 'for ozo' or 'for jlevie' thread.
Again, I really want to thank you a lot for your help.
Eric
0
 

Author Comment

by:Eric98
ID: 2660227
Just found the solution to my last little question myself :
ls -1 | xargs perl -ne 'print if /UPDATE/i'
So I give the points to ozo, and add a thread tfor jlevie.
Thanks
0
 
LVL 84

Expert Comment

by:ozo
ID: 2661252
perldoc perlre
perldoc perlrun
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Hello fellow BSD lovers, I've created a patch process for patching openjdk6 for BSD (FreeBSD specifically), although I tried to keep all BSD versions in mind when creating my patch. Welcome to OpenJDK6 on BSD First let me start with a little …
A metadevice consists of one or more devices (slices). It can be expanded by adding slices. Then, it can be grown to fill a larger space while the file system is in use. However, not all UNIX file systems (UFS) can be expanded this way. The conca…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now