Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

avoiding repeats

Posted on 2009-05-14
7
Medium Priority
?
189 Views
Last Modified: 2012-05-07
I have a file that looks like this:

Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

And I would like to skip lines that are repeated, lines repeated are for example:
Name:Bill;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

that have the same Name, the rest doesnt matter.
I have a routine that will read the file line by line and split it twice, semicolon first and then colon. Following this, it will make an array of hashes. How can avoid repeating the same name with the following code: Thanks!
sub read{                
   my $input = shift;                             
   open(FILE, $input);
   my @names;        
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        next if /^\s*(?:#|$)/;
        for my $element (@lines) { 
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;              
      }
push(@names, {%hash});                         
 }close(FILE);           
return @names;                                     
}

Open in new window

0
Comment
Question by:cucugirl
  • 4
  • 3
7 Comments
 

Author Comment

by:cucugirl
ID: 24387001
How can avoid repeating pushing the same name into the array of hashes? Thanks!

0
 
LVL 1

Accepted Solution

by:
berseken earned 2000 total points
ID: 24388627
The best thing from an efficiency point of view would probably be to sort the file outside Perl so that the file comes in sorted by name and then you can just keep track of the name on the previous line that came in and if the current line has the same name you just ignore it.

If you can't do that you will probably have to keep a hash of previously seen names as implemented here:

sub read{                
   my $input = shift;                            
   open(FILE, $input);
   my @names;        
   my %seen;
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;                            
      }
   push(@names, {%hash});                        
 }close(FILE);          
return @names;                                    
0
 
LVL 1

Expert Comment

by:berseken
ID: 24388727
also.. you should probably define %hash in the read subroutine or it is going to keep growing and consume all your memory.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:cucugirl
ID: 24388862
I tried implementing the changes, but it will only print the first line in the fileand I'm sure in my list I have probably just 2 repeated right now.. do you think there's a bug probably somewhere?
0
 
LVL 1

Expert Comment

by:berseken
ID: 24389307
don't know.. this works fine and i dump all the lines in /tmp/blah

I did run into an issue with calling the function 'read'..
use Data::Dumper;
 
sub read1{
   my $input = shift;
   open(FILE, $input);
   my @names;
   my %seen;
   while (<FILE>) {
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;
      }
     push(@names, {%hash});
  }
  close(FILE);
  return @names;
}
 
 
my @thing = read1("/tmp/blah");
 
print Dumper(\@thing);

Open in new window

0
 

Author Comment

by:cucugirl
ID: 24389433
where did you declare %hash?
0
 

Author Comment

by:cucugirl
ID: 24406539
hi, for another part of my code i need to push only the last one, and not the first one..
Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

i would push
Name:Bill;Location:Chicago;Age:47; rather than
Name:Bill;Location:Miami;Age:27; does anybody know how to do this? With the same routine I had in the beginning? thanks!!!!
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

572 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question