Link to home
Start Free TrialLog in
Avatar of mjcoyne
mjcoyne

asked on

Newbie -- Why won't this work (sorting contents of files)

I have a directory containing about 250 text files, each with a hundred or so lines containing data seperated by tabs.  The first field is a time field in epoch seconds.  I'm trying to sort the contents (lines) of all these files, one file at a time, and ouput the sorted results to another file, with the same name as the original file, but with a different extension.  I tried this:

#!/usr/local/bin/perl

my $dirname = "/var/www/html/folding/data/members";
my $new = "/var/www/html/folding/data/members/temp.tmp";

opendir(DIR, $dirname) or die "can't opendir $dirname: $!";

while (defined($teamfile = readdir(DIR))) {
     next if $teamfile =~ /^\.\.?$/;
        open(OLD, "<$dirname/$teamfile") or die "can't open OLD: $!";
        open (NEW, ">$new") or die "can't open NEW: $!";
        select (NEW);
        my(@lines) = <OLD>;
        @lines = sort(@lines);
        my($line);
        foreach $line (@lines) {

        print NEW "$line";

   }

my $newname =  "/var/www/html/folding/data/sorted/" . $teamfile . ".sort";
system("mv $new $newname");
close(NEW);
close(OLD);

}
closedir(DIR);


But it failed.  For an original file named, say orgfile.his, I would get orgfile.his.sort, orgfile.his.sort.sort, orgfile.his.sort.sort.sort, etc. (all files are full size, and contain apparently what I'm looking for).  There were hundreds of these files (.sort.sort.sort....).

I'm *very* new to Perl, and I can't spot the error.  It's probably glaring to one of you guys and gals.  Someone willing to help a blind man?

Thanks
ASKER CERTIFIED SOLUTION
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tintin
Tintin

I think you'd have to agree that it is much more simple to write it as:

#!/usr/local/bin/perl
use strict;

my $dirname = "/var/www/html/folding/data/members";

foreach my $teamfile (<$dirname/*.his>) {
      open FILE, $teamfile or die "can't open $teamfile: $!\n";
      open SORTED, ">$teamfile.sort" or die "Can not open $temafile.sort $!\n";
      print SORTED sort <FILE>;
      close FILE;
      close SORTED;
}
Yep.  That is much simpler.  I wasn't thinking in terms of re-writing it - I was just looking at fixing his problem.  Personally, I'd re-write your code as below.  I'm a big believer in not using glob patterns (I had a script fail due to hitting the glob limit).

#!/usr/local/bin/perl

use strict;
use warnings;

my $dirname = "/var/www/html/folding/data/members";

opendir DIR, $dirname or die $!;
while (readdir DIR) {
     next unless /\.his$/; # or whatever pattern you want to use to check the file
     my $teamfile = $_;
     open FILE, $teamfile or die "can't open $teamfile: $!\n";
     open SORTED, ">$teamfile.sort" or die "Can not open $temafile.sort $!\n";
     print SORTED sort <FILE>;
     close FILE;
     close SORTED;
}
closedir DIR;
Perl's glob is fine from 5.6.1 onwards.  
Avatar of mjcoyne

ASKER

I wish I could split the points among you guys...  I accepted wilcoxon's first answer among three correct answers 'cause he did correctly answer the question I asked...:)

See?  I told you I was a newbie...  Both examples of reworked code are obviously much simpler and cleaner ways of accomplishing what I was trying to do...  And, of course, shows the benefit of experience over newbieness...

Thanks to both of you!
You can split points by requesting it in the Community Support section.
Are you sure Tintin?  I believe perl's globbing has limits even in 5.8 - admittedly the limit is in the 1000s.  On the exmh-users list, this was just brought up recently as somebody wrote some mail threading code that used globbing and it failed for several people (one I remember had 11000 files in the directory).
The big difference in globbing from 5.6.1 onwards is that it is done internally rather than being limited by the limits of the csh.

I haven't seen anything to suggest that there is a limit in 5.8.x, but I guess it is possible with huge numbers of files.
I just tried it on over 32000 files with no problem
Odd.  I didn't check it (I don't have a directory with that many files) but I'm pretty sure the person on the exmh-users list specifically mentioned using perl 5.6.1.  Possibly working differently on different platforms?