Solved

avoiding repeats

Posted on 2009-05-14
7
179 Views
Last Modified: 2012-05-07
I have a file that looks like this:

Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

And I would like to skip lines that are repeated, lines repeated are for example:
Name:Bill;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

that have the same Name, the rest doesnt matter.
I have a routine that will read the file line by line and split it twice, semicolon first and then colon. Following this, it will make an array of hashes. How can avoid repeating the same name with the following code: Thanks!
sub read{                
   my $input = shift;                             
   open(FILE, $input);
   my @names;        
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        next if /^\s*(?:#|$)/;
        for my $element (@lines) { 
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;              
      }
push(@names, {%hash});                         
 }close(FILE);           
return @names;                                     
}

Open in new window

0
Comment
Question by:cucugirl
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 

Author Comment

by:cucugirl
ID: 24387001
How can avoid repeating pushing the same name into the array of hashes? Thanks!

0
 
LVL 1

Accepted Solution

by:
berseken earned 500 total points
ID: 24388627
The best thing from an efficiency point of view would probably be to sort the file outside Perl so that the file comes in sorted by name and then you can just keep track of the name on the previous line that came in and if the current line has the same name you just ignore it.

If you can't do that you will probably have to keep a hash of previously seen names as implemented here:

sub read{                
   my $input = shift;                            
   open(FILE, $input);
   my @names;        
   my %seen;
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;                            
      }
   push(@names, {%hash});                        
 }close(FILE);          
return @names;                                    
0
 
LVL 1

Expert Comment

by:berseken
ID: 24388727
also.. you should probably define %hash in the read subroutine or it is going to keep growing and consume all your memory.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:cucugirl
ID: 24388862
I tried implementing the changes, but it will only print the first line in the fileand I'm sure in my list I have probably just 2 repeated right now.. do you think there's a bug probably somewhere?
0
 
LVL 1

Expert Comment

by:berseken
ID: 24389307
don't know.. this works fine and i dump all the lines in /tmp/blah

I did run into an issue with calling the function 'read'..
use Data::Dumper;
 
sub read1{
   my $input = shift;
   open(FILE, $input);
   my @names;
   my %seen;
   while (<FILE>) {
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;
      }
     push(@names, {%hash});
  }
  close(FILE);
  return @names;
}
 
 
my @thing = read1("/tmp/blah");
 
print Dumper(\@thing);

Open in new window

0
 

Author Comment

by:cucugirl
ID: 24389433
where did you declare %hash?
0
 

Author Comment

by:cucugirl
ID: 24406539
hi, for another part of my code i need to push only the last one, and not the first one..
Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

i would push
Name:Bill;Location:Chicago;Age:47; rather than
Name:Bill;Location:Miami;Age:27; does anybody know how to do this? With the same routine I had in the beginning? thanks!!!!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question