?
Solved

avoiding repeats

Posted on 2009-05-14
7
Medium Priority
?
181 Views
Last Modified: 2012-05-07
I have a file that looks like this:

Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

And I would like to skip lines that are repeated, lines repeated are for example:
Name:Bill;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

that have the same Name, the rest doesnt matter.
I have a routine that will read the file line by line and split it twice, semicolon first and then colon. Following this, it will make an array of hashes. How can avoid repeating the same name with the following code: Thanks!
sub read{                
   my $input = shift;                             
   open(FILE, $input);
   my @names;        
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        next if /^\s*(?:#|$)/;
        for my $element (@lines) { 
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;              
      }
push(@names, {%hash});                         
 }close(FILE);           
return @names;                                     
}

Open in new window

0
Comment
Question by:cucugirl
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 

Author Comment

by:cucugirl
ID: 24387001
How can avoid repeating pushing the same name into the array of hashes? Thanks!

0
 
LVL 1

Accepted Solution

by:
berseken earned 2000 total points
ID: 24388627
The best thing from an efficiency point of view would probably be to sort the file outside Perl so that the file comes in sorted by name and then you can just keep track of the name on the previous line that came in and if the current line has the same name you just ignore it.

If you can't do that you will probably have to keep a hash of previously seen names as implemented here:

sub read{                
   my $input = shift;                            
   open(FILE, $input);
   my @names;        
   my %seen;
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;                            
      }
   push(@names, {%hash});                        
 }close(FILE);          
return @names;                                    
0
 
LVL 1

Expert Comment

by:berseken
ID: 24388727
also.. you should probably define %hash in the read subroutine or it is going to keep growing and consume all your memory.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:cucugirl
ID: 24388862
I tried implementing the changes, but it will only print the first line in the fileand I'm sure in my list I have probably just 2 repeated right now.. do you think there's a bug probably somewhere?
0
 
LVL 1

Expert Comment

by:berseken
ID: 24389307
don't know.. this works fine and i dump all the lines in /tmp/blah

I did run into an issue with calling the function 'read'..
use Data::Dumper;
 
sub read1{
   my $input = shift;
   open(FILE, $input);
   my @names;
   my %seen;
   while (<FILE>) {
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;
      }
     push(@names, {%hash});
  }
  close(FILE);
  return @names;
}
 
 
my @thing = read1("/tmp/blah");
 
print Dumper(\@thing);

Open in new window

0
 

Author Comment

by:cucugirl
ID: 24389433
where did you declare %hash?
0
 

Author Comment

by:cucugirl
ID: 24406539
hi, for another part of my code i need to push only the last one, and not the first one..
Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

i would push
Name:Bill;Location:Chicago;Age:47; rather than
Name:Bill;Location:Miami;Age:27; does anybody know how to do this? With the same routine I had in the beginning? thanks!!!!
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question