johnny99
asked on
PERL and Text Files -- My Head Hurts
I'd really like someone to explain to me, in words that a dummy like me can understand, how Perl reads or handles text files.
Specifically, I don't understand what that
while(<INFILE>);
thing means.
Say I was trying to write a script which would pick a random line from a 100-line text file.
I open the file with an
or die "can't find file"
error handler, I understand it this far, but then in order to do something with the file I need
while(<INFILE>){
do my stuff
}
Which I don't really understand -- does while(<INFILE>) mean "read every line of the file"? Does it mean "put every line of the file into $_ one by one"? Does it mean "keep on reading the file until some condition is satisfied, at which time whatever line I'm reading will be placed in $_"?
If I want to generate a random number, then pick that line from my 100 lines, how do I script that? And if I wanted to repeat the "get a random line" process a random number of times, how would I script that?
The trouble is that in my head the process goes like this:
Generate a random number X
Open the file
Get line X of the file
Which ought to be simple, but PERL doesn't seem to do it like that. As far as I can work out from the book I have, it has to read each line of the file into memory and throw them away before it gets to the one it wants.
And if I want to repeat the process, getting a random line from my file Y number of times, I see it as:
Generate a random number Y
Repeat Y number of times
Generate a random number X
Get line X of the file
Stop repeating
And again, maybe it's just me, but I'm finding that difficult. DO I have to open the file every time? Do I have to do that while(<INFILE>) thing every time or just once?
I'll gladly give stacks of points to someone who can make me understand how this all works, because it's making my head hurt...
Specifically, I don't understand what that
while(<INFILE>);
thing means.
Say I was trying to write a script which would pick a random line from a 100-line text file.
I open the file with an
or die "can't find file"
error handler, I understand it this far, but then in order to do something with the file I need
while(<INFILE>){
do my stuff
}
Which I don't really understand -- does while(<INFILE>) mean "read every line of the file"? Does it mean "put every line of the file into $_ one by one"? Does it mean "keep on reading the file until some condition is satisfied, at which time whatever line I'm reading will be placed in $_"?
If I want to generate a random number, then pick that line from my 100 lines, how do I script that? And if I wanted to repeat the "get a random line" process a random number of times, how would I script that?
The trouble is that in my head the process goes like this:
Generate a random number X
Open the file
Get line X of the file
Which ought to be simple, but PERL doesn't seem to do it like that. As far as I can work out from the book I have, it has to read each line of the file into memory and throw them away before it gets to the one it wants.
And if I want to repeat the process, getting a random line from my file Y number of times, I see it as:
Generate a random number Y
Repeat Y number of times
Generate a random number X
Get line X of the file
Stop repeating
And again, maybe it's just me, but I'm finding that difficult. DO I have to open the file every time? Do I have to do that while(<INFILE>) thing every time or just once?
I'll gladly give stacks of points to someone who can make me understand how this all works, because it's making my head hurt...
perldoc -q 'random line'
Found in perlfaq5.pod
How do I select a random line from a file?
Here's an algorithm from the Camel Book:
srand;
rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading
the whole file in. A simple proof by induction is
available upon request if you doubt its correctness.
srand;
rand($.) < 1 && ($line = $_) while <>;
Found in perlfaq5.pod
How do I select a random line from a file?
Here's an algorithm from the Camel Book:
srand;
rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading
the whole file in. A simple proof by induction is
available upon request if you doubt its correctness.
srand;
rand($.) < 1 && ($line = $_) while <>;
@lines = <INFILE>; #grab all lines from the file
# fisher_yates_shuffle( \@array ) :
# generate a random permutation of @array in place
sub fisher_yates_shuffle {
my $array = shift;
my $i;
for( $i = @$array; --$i; ){
my $j = int rand ($i+1);
next if $i == $j;
@$array[$i,$j] = @$array[$j,$i];
}
}
fisher_yates_shuffle( \@lines ); # permutes @lines
# fisher_yates_shuffle( \@array ) :
# generate a random permutation of @array in place
sub fisher_yates_shuffle {
my $array = shift;
my $i;
for( $i = @$array; --$i; ){
my $j = int rand ($i+1);
next if $i == $j;
@$array[$i,$j] = @$array[$j,$i];
}
}
fisher_yates_shuffle( \@lines ); # permutes @lines
ASKER
Martinag, you can certainly have the points, for being the only person to even try and answer the question in the way it was asked, i.e. slowly and carefully.
Other people who replied, I'm sure your stuff is very clever, but the information about what all of that stuff means is more valuable than any amount of neat solutions, whether they work or not...
Can you tell me a good place on the web or a good book to help me with questions like this?
Other people who replied, I'm sure your stuff is very clever, but the information about what all of that stuff means is more valuable than any amount of neat solutions, whether they work or not...
Can you tell me a good place on the web or a good book to help me with questions like this?
perldoc -q book
Found in perlfaq2.pod
Perl Books
A number of books on Perl and/or CGI programming are
available. A few of these are good, some are ok, but
many aren't worth your money. Tom Christiansen maintains
a list of these books, some with extensive reviews, at
http://www.perl.com/perl/critiques/index.html.
The incontestably definitive reference book on Perl,
written by the creator of Perl, is now in its second
edition:
Programming Perl (the "Camel Book"):
Authors: Larry Wall, Tom Christiansen, and Randal Schwartz
ISBN 1-56592-149-6 (English)
ISBN 4-89052-384-7 (Japanese)
URL: http://www.oreilly.com/catalog/pperl2/
(French, German, Italian, and Hungarian translations also
available)
The companion volume to the Camel containing thousands
of real-world examples, mini-tutorials, and complete
programs (first premiering at the 1998 Perl Conference),
is:
The Perl Cookbook (the "Ram Book"):
Authors: Tom Christiansen and Nathan Torkington,
with Foreword by Larry Wall
ISBN: 1-56592-243-3
URL: http://perl.oreilly.com/cookbook/
If you're already a hard-core systems programmer, then
the Camel Book might suffice for you to learn Perl from.
But if you're not, check out:
Learning Perl (the "Llama Book"):
Authors: Randal Schwartz and Tom Christiansen
with Foreword by Larry Wall
ISBN: 1-56592-284-0
URL: http://www.oreilly.com/catalog/lperl2/
Despite the picture at the URL above, the second edition
of "Llama Book" really has a blue cover, and is updated
for the 5.004 release of Perl. Various foreign language
editions are available, including *Learning Perl on
Win32 Systems* (the Gecko Book).
If you're not an accidental programmer, but a more
serious and possibly even degreed computer scientist who
doesn't need as much hand-holding as we try to provide
in the Llama or its defurred cousin the Gecko, please
check out the delightful book, *Perl: The Programmer's
Companion*, written by Nigel Chapman.
You can order O'Reilly books directly from O'Reilly &
Associates, 1-800-998-9938. Local/overseas is 1-707-829-
0515. If you can locate an O'Reilly order form, you can
also fax to 1-707-829-0104. See http://www.ora.com/ on
the Web.
What follows is a list of the books that the FAQ authors
found personally useful. Your mileage may (but, we hope,
probably won't) vary.
Recommended books on (or muchly on) Perl follow; those
marked with a star may be ordered from O'Reilly.
References *Programming Perl by Larry Wall, Tom Christiansen, and
Randal L. Schwartz
*Perl 5 Desktop Reference
By Johan Vromans
Tutorials
*Learning Perl [2nd edition]
by Randal L. Schwartz and Tom Christiansen
with foreword by Larry Wall
*Learning Perl on Win32 Systems
by Randal L. Schwartz, Erik Olson, and Tom Christiansen,
with foreword by Larry Wall
Perl: The Programmer's Companion
by Nigel Chapman
Cross-Platform Perl
by Eric F. Johnson
MacPerl: Power and Ease
by Vicki Brown and Chris Nandor, foreword by Matthias Neeracher
Task-Oriented
*The Perl Cookbook
by Tom Christiansen and Nathan Torkington
with foreword by Larry Wall
Perl5 Interactive Course [2nd edition]
by Jon Orwant
*Advanced Perl Programming
by Sriram Srinivasan
Effective Perl Programming
by Joseph Hall
Special Topics
*Mastering Regular Expressions
by Jeffrey Friedl
How to Set up and Maintain a World Wide Web Site [2nd edition]
by Lincoln Stein
Found in perlfaq2.pod
Perl Books
A number of books on Perl and/or CGI programming are
available. A few of these are good, some are ok, but
many aren't worth your money. Tom Christiansen maintains
a list of these books, some with extensive reviews, at
http://www.perl.com/perl/critiques/index.html.
The incontestably definitive reference book on Perl,
written by the creator of Perl, is now in its second
edition:
Programming Perl (the "Camel Book"):
Authors: Larry Wall, Tom Christiansen, and Randal Schwartz
ISBN 1-56592-149-6 (English)
ISBN 4-89052-384-7 (Japanese)
URL: http://www.oreilly.com/catalog/pperl2/
(French, German, Italian, and Hungarian translations also
available)
The companion volume to the Camel containing thousands
of real-world examples, mini-tutorials, and complete
programs (first premiering at the 1998 Perl Conference),
is:
The Perl Cookbook (the "Ram Book"):
Authors: Tom Christiansen and Nathan Torkington,
with Foreword by Larry Wall
ISBN: 1-56592-243-3
URL: http://perl.oreilly.com/cookbook/
If you're already a hard-core systems programmer, then
the Camel Book might suffice for you to learn Perl from.
But if you're not, check out:
Learning Perl (the "Llama Book"):
Authors: Randal Schwartz and Tom Christiansen
with Foreword by Larry Wall
ISBN: 1-56592-284-0
URL: http://www.oreilly.com/catalog/lperl2/
Despite the picture at the URL above, the second edition
of "Llama Book" really has a blue cover, and is updated
for the 5.004 release of Perl. Various foreign language
editions are available, including *Learning Perl on
Win32 Systems* (the Gecko Book).
If you're not an accidental programmer, but a more
serious and possibly even degreed computer scientist who
doesn't need as much hand-holding as we try to provide
in the Llama or its defurred cousin the Gecko, please
check out the delightful book, *Perl: The Programmer's
Companion*, written by Nigel Chapman.
You can order O'Reilly books directly from O'Reilly &
Associates, 1-800-998-9938. Local/overseas is 1-707-829-
0515. If you can locate an O'Reilly order form, you can
also fax to 1-707-829-0104. See http://www.ora.com/ on
the Web.
What follows is a list of the books that the FAQ authors
found personally useful. Your mileage may (but, we hope,
probably won't) vary.
Recommended books on (or muchly on) Perl follow; those
marked with a star may be ordered from O'Reilly.
References *Programming Perl by Larry Wall, Tom Christiansen, and
Randal L. Schwartz
*Perl 5 Desktop Reference
By Johan Vromans
Tutorials
*Learning Perl [2nd edition]
by Randal L. Schwartz and Tom Christiansen
with foreword by Larry Wall
*Learning Perl on Win32 Systems
by Randal L. Schwartz, Erik Olson, and Tom Christiansen,
with foreword by Larry Wall
Perl: The Programmer's Companion
by Nigel Chapman
Cross-Platform Perl
by Eric F. Johnson
MacPerl: Power and Ease
by Vicki Brown and Chris Nandor, foreword by Matthias Neeracher
Task-Oriented
*The Perl Cookbook
by Tom Christiansen and Nathan Torkington
with foreword by Larry Wall
Perl5 Interactive Course [2nd edition]
by Jon Orwant
*Advanced Perl Programming
by Sriram Srinivasan
Effective Perl Programming
by Joseph Hall
Special Topics
*Mastering Regular Expressions
by Jeffrey Friedl
How to Set up and Maintain a World Wide Web Site [2nd edition]
by Lincoln Stein
ASKER
Thanks Ozo -- it sounds as if I'm a lama, rather than a camel, at this stage.
johnny "that's pronounced 'lay-muh', right?" 99
johnny "that's pronounced 'lay-muh', right?" 99
perldoc
will also give you access to much of the information in the camel,
as well as answers to many common questions.
while( <INFILE> ){ }
is actually a magic shorthand for
while( defined($_=<INFILE>) ){ }
this is covered in
perldoc perlop
under I/O Operators
will also give you access to much of the information in the camel,
as well as answers to many common questions.
while( <INFILE> ){ }
is actually a magic shorthand for
while( defined($_=<INFILE>) ){ }
this is covered in
perldoc perlop
under I/O Operators
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
>> do my stuff
>> }
You're absolutely correct. While there are one or more lines left in the file to read it willl be put in $_ and true is returned. When true is returned it tells the while() statement to run once more. When end of file is reached, false is returned and the while() statement won't be run any more.
srand; // Gives better random numbers
$lineNumber = int(rand 100) + 1; // Get a number between 1 and 100
for ($i=1;$i<=$lineNumber;$i++
<INFILE>;
$thatLine = $_;
will do the trick for you. I am not using a while() here. Insted I use a for loop.
It starts by setting $i to 1. Then it is run as long as $i is lower than or equal to (<=) $lineNumber. Every time the loop has run $i is incremented by one (++).
Then, when the lines have been read, the last that was read ($_) will be put in $thatLine.
>> Generate a random number Y
>> Repeat Y number of times
>> Generate a random number X
>> Get line X of the file
>> Stop repeating
Correct.
>> DO I have to open the file every time? Do I have to do that while(<INFILE>) thing every time or just once?
Not really actually. You can put the lines in an array and use it every time which means you won't have to re-read the file:
srand;
times = int(rand 10) + 1 // Between 1 and 10
open (INFILE, "<file.txt") or die "open file: $!";
$lineCount = 0;
while (<INFILE>)
$lines[$lineCount++] = $_; // Add line to array and increment $lineCount
close INFILE;
for ($i=0;$i<$times;$i++)
$randomedLines[$i] = $lines[int(rand 100)+1];
The lines will now be put in @randomedLines.
Martin