Solved

Grep pb

Posted on 2000-03-24
13
507 Views
Last Modified: 2010-04-21
I have a directory with a few thousand files. For this reason I use following command to grep them :
ls -1 | xargs grep UPDATE
Unfortunately a few of these files have some very long lines and grep outputs a couple of :
grep : 0652-226 ...
followed by a message saying that the maximal length of a line (2048)  is reached.
I have no idea what file exactly have the problem (as I said I have a few thousands of them), and even worse,  I don't know if it is just a display problem or if I may have missed some of the UPDATE words I am looking for.
Please help me without being to specific about my command which is only an example, I have the problem of the lines too long all the time.
Furthermore, for many uninteresting reasons I am not able to split my files in multiple subdirectories.
Thanks a lot
0
Comment
Question by:Eric98
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
  • +2
13 Comments
 
LVL 2

Expert Comment

by:GP1628
ID: 2655555
Unix is multi-user. It gives each user a certain amount of resources including memory. Its not a good idea to work around it really since that many files are not recommended in a single directory. Is ther eany way to break things up into subdirectorys? Such as using the first character of the file name? thats why (and how) alot of ISPs break up their users into /home/g/gandalf

If you REALLY have to do it then try doing it in stages such as
ls -1 0*|grep UPDATE

maybe even do a "for" loop to go 0*, 1*, 2*, 3*

Gandalf  Parker

0
 
LVL 40

Expert Comment

by:jlevie
ID: 2656133
A better way, that bypasses the limitations of ls when dealing with large directories is to use find. Something like:

find . -exec grep UPDATE {} \;

is functionally equivalent. It doesn't help with the the line length limitation in grep that you are running into.

Since I can't tell from your question exactly what you need to know (only what files contain "UPDATE"" or the line from each file that contains "UPDATE"?) it's hard to say how to solve that part of the problem. A shell script, executed from find, could determine if the keyword in in the file by using the output of "strings" on the file piped into grep, with the file name echoed if the string was found. A "script-file something like:

#!/bin/sh
strings $1 | grep UPDATE >/dev/null
if [ $? = 0 ]; then
  echo $1
fi

Used with "find . -exec script-file {} \;" would do it.
0
 

Author Comment

by:Eric98
ID: 2656428
Adjusted points from 100 to 150
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Eric98
ID: 2656429
I admit my question is not exactly what one could call clear. As jlevie suggests, I need to know the lines (that means the content, not the line number) from each files that contains UPDATE.

My basic need is : from my directory with all that files, I want to build a file with all lines from all files containing the word UPDATE.
And I want to be sure not to miss a single one, which means that in my new file there must also be the lines with more then 2048 caracters and containing UPDATE, or at least I want to know what lines - if any - have not been checked because they were too long.

And as I said already, there's no way to break into multiple subdirectories, nor do I want to break all my commands in multiple sub-commands as suggested by GB1628.

Thanks to both of you, I realize this is a hard question, and I increase the points
0
 
LVL 2

Expert Comment

by:GP1628
ID: 2656692
According to the grep man page......
Lines are limited to  BUFSIZ  characters;  longer  lines  are  truncated.  BUFSIZ  is defined in     <stdio.h>.

You could try changing the stdio.h and recompiling grep then change it back. However there may have been a good reason for the limit :)
Using sed as a filter seems to have the same limit.

Even if we get past that... The other problem is that if you try to convert "*" to a list of files then you will hit a maximum. THe workaround would be to read all the file names into a list, then process the list one at a time with something like a "for" loop. Even thats going to be hard to work out with so many files.

Gandalf  Parker
0
 
LVL 40

Expert Comment

by:jlevie
ID: 2656740
Now that I know what you need to do I can suggest an alternative that will work. I've done something very similar in the past with a perl script and for similar reasons. I doubt that I've still got the script, but I could whip up a new one in short order. Do you have perl installed?
0
 
LVL 84

Accepted Solution

by:
ozo earned 150 total points
ID: 2657043
perl -ne 'print if /UPDATE/'
does not have a BUFSIZ limit
0
 
LVL 40

Expert Comment

by:jlevie
ID: 2657180
Well I did still have the code, it's below...

---snip, snip---begin dir-search---
#!/usr/bin/perl
#
# NAME
#      dir-search - Searches big dirs
#
# SYNTAX
#      dir-search string
#
# DESCRIPTION
#       The current directory is read to get the list of  files (skipping any sub-dirs)
#       and each file is scanned for lines containing the specified string. There's
#       some limit on the number of files that can be in the directory, but it's
#       known to work with about 10,000 files. Likewise there's probably some
#       limit on the max line size, but it's been observed to work properly with files
#       containing more than 8kb per line.
#      
#
# Author; Jim Levie
#
if ($#ARGV < 0)
{
    die "Useage: dir-search string\n";
}
$str = $ARGV[0];
opendir(DIR, '.') || die "Can't read current dir";
@files = readdir(DIR);
closedir(DIR);

for (@files)
{
  next if $_ eq '.';
  next if $_ eq '..';
  next if (-d $_);
 
  $file = $_;
  open(INP, $file) || die "Can't open <$file>\n";
  while(<INP>)
  {
    if(/$str/) { print $_; }
  }
  close($file);
}
0
 
LVL 2

Expert Comment

by:mapc
ID: 2658770
you're using AIX.. poor man.
aix grep and shell-utils have those limitations.
Either use perl as proposed, or, install GNU grep.
0
 

Author Comment

by:Eric98
ID: 2659939
Great, both your solutions (jlevie and ozo) work very well and do exactly what I want.
This raises for me once more the (not that bad) problem to decide which of you guys should get the points.
As I do not have a man for the perl command on my computer, please give me one more information and I'll give the points to both of you.
The information is how to modify -either the perl command line or the script - in order to perform the same search with case insensivity. In other words, what would be the equivalent of grep -i ?
I suggest the first of you - jlevie or ozo - posts this information as an answer, the second one will get his points through a 'for ozo' or 'for jlevie' thread.
Again, I really want to thank you a lot for your help.
Eric
0
 

Author Comment

by:Eric98
ID: 2659940
Great, both your solutions (jlevie and ozo) work very well and do exactly what I want.
This raises for me once more the (not that bad) problem to decide which of you guys should get the points.
As I do not have a man for the perl command on my computer, please give me one more information and I'll give the points to both of you.
The information is how to modify -either the perl command line or the script - in order to perform the same search with case insensivity. In other words, what would be the equivalent of grep -i ?
I suggest the first of you - jlevie or ozo - posts this information as an answer, the second one will get his points through a 'for ozo' or 'for jlevie' thread.
Again, I really want to thank you a lot for your help.
Eric
0
 

Author Comment

by:Eric98
ID: 2660227
Just found the solution to my last little question myself :
ls -1 | xargs perl -ne 'print if /UPDATE/i'
So I give the points to ozo, and add a thread tfor jlevie.
Thanks
0
 
LVL 84

Expert Comment

by:ozo
ID: 2661252
perldoc perlre
perldoc perlrun
0

Featured Post

Want Experts Exchange at your fingertips?

With Experts Exchange’s latest app release, you can now experience our most recent features, updates, and the same community interface while on-the-go. Download our latest app release at the Android or Apple stores today!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Let's say you need to move the data of a file system from one partition to another. This generally involves dismounting the file system, backing it up to tapes, and restoring it to a new partition. You may also copy the file system from one place to…
FreeBSD on EC2 FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

615 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question