Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 337
  • Last Modified:

`grep...` v grep /.../

I have an application that extracts a relatively few lines from a relatively large file.  Right now my code reads:

@records=`grep $whatToLookFor theFile`;

I did it this way rather than:

open(FIL,"theFile");
@records=grep /$whatToLookFor/,<FIL>;
close(FIL);

since I was wrried that perl may actually bring in the entire file and then do the grep.  Cleraly my method has the disadvantage of using an extra shell invocation.  If I replace with the =grep/.../ will perl read the file line by line and do the greap, or will it be as horrible as I was concerned about?
0
jhurst
Asked:
jhurst
  • 4
  • 4
1 Solution
 
ozoCommented:
grep /$whatToLookFor/,<FIL>;#will bring in the entire file and then do the grep

while( <FIL> ){
    push @records,$_ if /$whatToLookFor/; #will read lines one at a time and only store the matching lines in @record
}
0
 
jhurstAuthor Commented:
Thanks ozo.

You not only confirmed my fears but even suggested a better alternative.

However, I have tested and with a 40k line file the `grep...` is MUCH faster than the while() solution.  I guess that grep is more efficiently written than the perl interpretter.

BTW, how do you know this?  I looked everywhere I could think of and could see  no documentation as to why the
@x=grep /whetaver/,<IN>;
would pull in the whole file first.

It sort of made sense to me that it would be as you sugegsted since the alternative requires two processes and a pipe but I still could not find it documented.
0
 
burtdavCommented:
(@x=grep /whetaver/,<IN>) pulls the whole file, as it uses the filehandle in list context. (print <IN>) does this, too. If you want to read a single line, you have to be careful to use the <> operator in scalar context.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
jhurstAuthor Commented:
Where are you finding that in the docs burtdav?  It seems VERY unlikely.
0
 
burtdavCommented:
I'm certain it's true. I probably read it in an O'Reilly camel-book, but it must be in the perldoc. Wait...
Thanks, Google.
It's in perlintro, second paragraph of "Files and I/O":
http://www.perldoc.com/perl5.8.0/pod/perlintro.html#Files-and-I-O
0
 
jhurstAuthor Commented:
Great, thanks.

What is interesting is that I have previously, and just repeated, experiments and they sure appear to show that this is not the case.
0
 
burtdavCommented:
Please post them; I've not found an exception.
0
 
burtdavCommented:
I mean, if perlintro is wrong, what hope can there be? Is that too philosophical for a tech TA?
0
 
jhurstAuthor Commented:
ok, you can repeat my test, it is somewhat "non-scientific".  I just happened to have a large file, 260M of voter registration data.  Opened it on IN and then did the grep as shown above.

While running it I used top to see memory and cpu useage.  Repeated the thing with the file that is produced by the grep but @data=<IN>;, no grep.

In the latter case much more memory was used.

I should add that my testing appears to indicates that
@x=`grep pattern file`; is by ar the most efficient on a 2G pentium running Linux.  I am assuming that this is the case because the grep is more efficient than the perl grep and probably because you are right and my perl scripts do load the whole file.  I have only 512M of ram on the machine, btw.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now