Link to home
Start Free TrialLog in
Avatar of jhurst
jhurst

asked on

`grep...` v grep /.../

I have an application that extracts a relatively few lines from a relatively large file.  Right now my code reads:

@records=`grep $whatToLookFor theFile`;

I did it this way rather than:

open(FIL,"theFile");
@records=grep /$whatToLookFor/,<FIL>;
close(FIL);

since I was wrried that perl may actually bring in the entire file and then do the grep.  Cleraly my method has the disadvantage of using an extra shell invocation.  If I replace with the =grep/.../ will perl read the file line by line and do the greap, or will it be as horrible as I was concerned about?
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jhurst
jhurst

ASKER

Thanks ozo.

You not only confirmed my fears but even suggested a better alternative.

However, I have tested and with a 40k line file the `grep...` is MUCH faster than the while() solution.  I guess that grep is more efficiently written than the perl interpretter.

BTW, how do you know this?  I looked everywhere I could think of and could see  no documentation as to why the
@x=grep /whetaver/,<IN>;
would pull in the whole file first.

It sort of made sense to me that it would be as you sugegsted since the alternative requires two processes and a pipe but I still could not find it documented.
(@x=grep /whetaver/,<IN>) pulls the whole file, as it uses the filehandle in list context. (print <IN>) does this, too. If you want to read a single line, you have to be careful to use the <> operator in scalar context.
Avatar of jhurst

ASKER

Where are you finding that in the docs burtdav?  It seems VERY unlikely.
I'm certain it's true. I probably read it in an O'Reilly camel-book, but it must be in the perldoc. Wait...
Thanks, Google.
It's in perlintro, second paragraph of "Files and I/O":
http://www.perldoc.com/perl5.8.0/pod/perlintro.html#Files-and-I-O
Avatar of jhurst

ASKER

Great, thanks.

What is interesting is that I have previously, and just repeated, experiments and they sure appear to show that this is not the case.
Please post them; I've not found an exception.
I mean, if perlintro is wrong, what hope can there be? Is that too philosophical for a tech TA?
Avatar of jhurst

ASKER

ok, you can repeat my test, it is somewhat "non-scientific".  I just happened to have a large file, 260M of voter registration data.  Opened it on IN and then did the grep as shown above.

While running it I used top to see memory and cpu useage.  Repeated the thing with the file that is produced by the grep but @data=<IN>;, no grep.

In the latter case much more memory was used.

I should add that my testing appears to indicates that
@x=`grep pattern file`; is by ar the most efficient on a 2G pentium running Linux.  I am assuming that this is the case because the grep is more efficient than the perl grep and probably because you are right and my perl scripts do load the whole file.  I have only 512M of ram on the machine, btw.