?
Solved

`grep...` v grep /.../

Posted on 2003-03-08
9
Medium Priority
?
336 Views
Last Modified: 2010-08-05
I have an application that extracts a relatively few lines from a relatively large file.  Right now my code reads:

@records=`grep $whatToLookFor theFile`;

I did it this way rather than:

open(FIL,"theFile");
@records=grep /$whatToLookFor/,<FIL>;
close(FIL);

since I was wrried that perl may actually bring in the entire file and then do the grep.  Cleraly my method has the disadvantage of using an extra shell invocation.  If I replace with the =grep/.../ will perl read the file line by line and do the greap, or will it be as horrible as I was concerned about?
0
Comment
Question by:jhurst
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
9 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 100 total points
ID: 8096873
grep /$whatToLookFor/,<FIL>;#will bring in the entire file and then do the grep

while( <FIL> ){
    push @records,$_ if /$whatToLookFor/; #will read lines one at a time and only store the matching lines in @record
}
0
 
LVL 8

Author Comment

by:jhurst
ID: 8099210
Thanks ozo.

You not only confirmed my fears but even suggested a better alternative.

However, I have tested and with a 40k line file the `grep...` is MUCH faster than the while() solution.  I guess that grep is more efficiently written than the perl interpretter.

BTW, how do you know this?  I looked everywhere I could think of and could see  no documentation as to why the
@x=grep /whetaver/,<IN>;
would pull in the whole file first.

It sort of made sense to me that it would be as you sugegsted since the alternative requires two processes and a pipe but I still could not find it documented.
0
 
LVL 5

Expert Comment

by:burtdav
ID: 8523199
(@x=grep /whetaver/,<IN>) pulls the whole file, as it uses the filehandle in list context. (print <IN>) does this, too. If you want to read a single line, you have to be careful to use the <> operator in scalar context.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 8

Author Comment

by:jhurst
ID: 8528190
Where are you finding that in the docs burtdav?  It seems VERY unlikely.
0
 
LVL 5

Expert Comment

by:burtdav
ID: 8528447
I'm certain it's true. I probably read it in an O'Reilly camel-book, but it must be in the perldoc. Wait...
Thanks, Google.
It's in perlintro, second paragraph of "Files and I/O":
http://www.perldoc.com/perl5.8.0/pod/perlintro.html#Files-and-I-O
0
 
LVL 8

Author Comment

by:jhurst
ID: 8528478
Great, thanks.

What is interesting is that I have previously, and just repeated, experiments and they sure appear to show that this is not the case.
0
 
LVL 5

Expert Comment

by:burtdav
ID: 8528485
Please post them; I've not found an exception.
0
 
LVL 5

Expert Comment

by:burtdav
ID: 8528489
I mean, if perlintro is wrong, what hope can there be? Is that too philosophical for a tech TA?
0
 
LVL 8

Author Comment

by:jhurst
ID: 8528667
ok, you can repeat my test, it is somewhat "non-scientific".  I just happened to have a large file, 260M of voter registration data.  Opened it on IN and then did the grep as shown above.

While running it I used top to see memory and cpu useage.  Repeated the thing with the file that is produced by the grep but @data=<IN>;, no grep.

In the latter case much more memory was used.

I should add that my testing appears to indicates that
@x=`grep pattern file`; is by ar the most efficient on a 2G pentium running Linux.  I am assuming that this is the case because the grep is more efficient than the perl grep and probably because you are right and my perl scripts do load the whole file.  I have only 512M of ram on the machine, btw.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question