?
Solved

Newbie --  Why won't this work (sorting contents of files)

Posted on 2003-04-01
10
Medium Priority
?
150 Views
Last Modified: 2012-05-04
I have a directory containing about 250 text files, each with a hundred or so lines containing data seperated by tabs.  The first field is a time field in epoch seconds.  I'm trying to sort the contents (lines) of all these files, one file at a time, and ouput the sorted results to another file, with the same name as the original file, but with a different extension.  I tried this:

#!/usr/local/bin/perl

my $dirname = "/var/www/html/folding/data/members";
my $new = "/var/www/html/folding/data/members/temp.tmp";

opendir(DIR, $dirname) or die "can't opendir $dirname: $!";

while (defined($teamfile = readdir(DIR))) {
     next if $teamfile =~ /^\.\.?$/;
        open(OLD, "<$dirname/$teamfile") or die "can't open OLD: $!";
        open (NEW, ">$new") or die "can't open NEW: $!";
        select (NEW);
        my(@lines) = <OLD>;
        @lines = sort(@lines);
        my($line);
        foreach $line (@lines) {

        print NEW "$line";

   }

my $newname =  "/var/www/html/folding/data/sorted/" . $teamfile . ".sort";
system("mv $new $newname");
close(NEW);
close(OLD);

}
closedir(DIR);


But it failed.  For an original file named, say orgfile.his, I would get orgfile.his.sort, orgfile.his.sort.sort, orgfile.his.sort.sort.sort, etc. (all files are full size, and contain apparently what I'm looking for).  There were hundreds of these files (.sort.sort.sort....).

I'm *very* new to Perl, and I can't spot the error.  It's probably glaring to one of you guys and gals.  Someone willing to help a blind man?

Thanks
0
Comment
Question by:mjcoyne
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 26

Accepted Solution

by:
wilcoxon earned 150 total points
ID: 8248303
Change this line:

next if $teamfile =~ /^\.\.?$/;

to this:

next if ($teamfile =~ /^\.\.?$/ or $teamfile =~ /\.sort$/);

The problem is that the readdir is picking up your new files as they are written and you are not skipping files that already end in .sort.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 8250100
I think you'd have to agree that it is much more simple to write it as:

#!/usr/local/bin/perl
use strict;

my $dirname = "/var/www/html/folding/data/members";

foreach my $teamfile (<$dirname/*.his>) {
      open FILE, $teamfile or die "can't open $teamfile: $!\n";
      open SORTED, ">$teamfile.sort" or die "Can not open $temafile.sort $!\n";
      print SORTED sort <FILE>;
      close FILE;
      close SORTED;
}
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 8250156
Yep.  That is much simpler.  I wasn't thinking in terms of re-writing it - I was just looking at fixing his problem.  Personally, I'd re-write your code as below.  I'm a big believer in not using glob patterns (I had a script fail due to hitting the glob limit).

#!/usr/local/bin/perl

use strict;
use warnings;

my $dirname = "/var/www/html/folding/data/members";

opendir DIR, $dirname or die $!;
while (readdir DIR) {
     next unless /\.his$/; # or whatever pattern you want to use to check the file
     my $teamfile = $_;
     open FILE, $teamfile or die "can't open $teamfile: $!\n";
     open SORTED, ">$teamfile.sort" or die "Can not open $temafile.sort $!\n";
     print SORTED sort <FILE>;
     close FILE;
     close SORTED;
}
closedir DIR;
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 48

Expert Comment

by:Tintin
ID: 8250260
Perl's glob is fine from 5.6.1 onwards.  
0
 
LVL 17

Author Comment

by:mjcoyne
ID: 8250300
I wish I could split the points among you guys...  I accepted wilcoxon's first answer among three correct answers 'cause he did correctly answer the question I asked...:)

See?  I told you I was a newbie...  Both examples of reworked code are obviously much simpler and cleaner ways of accomplishing what I was trying to do...  And, of course, shows the benefit of experience over newbieness...

Thanks to both of you!
0
 
LVL 48

Expert Comment

by:Tintin
ID: 8250317
You can split points by requesting it in the Community Support section.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 8254538
Are you sure Tintin?  I believe perl's globbing has limits even in 5.8 - admittedly the limit is in the 1000s.  On the exmh-users list, this was just brought up recently as somebody wrote some mail threading code that used globbing and it failed for several people (one I remember had 11000 files in the directory).
0
 
LVL 48

Expert Comment

by:Tintin
ID: 8257029
The big difference in globbing from 5.6.1 onwards is that it is done internally rather than being limited by the limits of the csh.

I haven't seen anything to suggest that there is a limit in 5.8.x, but I guess it is possible with huge numbers of files.
0
 
LVL 84

Expert Comment

by:ozo
ID: 8257352
I just tried it on over 32000 files with no problem
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 8257527
Odd.  I didn't check it (I don't have a directory with that many files) but I'm pretty sure the person on the exmh-users list specifically mentioned using perl 5.6.1.  Possibly working differently on different platforms?
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question