Solved

avoiding repeats

Posted on 2009-05-14
7
176 Views
Last Modified: 2012-05-07
I have a file that looks like this:

Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

And I would like to skip lines that are repeated, lines repeated are for example:
Name:Bill;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

that have the same Name, the rest doesnt matter.
I have a routine that will read the file line by line and split it twice, semicolon first and then colon. Following this, it will make an array of hashes. How can avoid repeating the same name with the following code: Thanks!
sub read{                
   my $input = shift;                             
   open(FILE, $input);
   my @names;        
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        next if /^\s*(?:#|$)/;
        for my $element (@lines) { 
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;              
      }
push(@names, {%hash});                         
 }close(FILE);           
return @names;                                     
}

Open in new window

0
Comment
Question by:cucugirl
  • 4
  • 3
7 Comments
 

Author Comment

by:cucugirl
ID: 24387001
How can avoid repeating pushing the same name into the array of hashes? Thanks!

0
 
LVL 1

Accepted Solution

by:
berseken earned 500 total points
ID: 24388627
The best thing from an efficiency point of view would probably be to sort the file outside Perl so that the file comes in sorted by name and then you can just keep track of the name on the previous line that came in and if the current line has the same name you just ignore it.

If you can't do that you will probably have to keep a hash of previously seen names as implemented here:

sub read{                
   my $input = shift;                            
   open(FILE, $input);
   my @names;        
   my %seen;
   while (<FILE>) {                                        
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;                            
      }
   push(@names, {%hash});                        
 }close(FILE);          
return @names;                                    
0
 
LVL 1

Expert Comment

by:berseken
ID: 24388727
also.. you should probably define %hash in the read subroutine or it is going to keep growing and consume all your memory.
0
Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

 

Author Comment

by:cucugirl
ID: 24388862
I tried implementing the changes, but it will only print the first line in the fileand I'm sure in my list I have probably just 2 repeated right now.. do you think there's a bug probably somewhere?
0
 
LVL 1

Expert Comment

by:berseken
ID: 24389307
don't know.. this works fine and i dump all the lines in /tmp/blah

I did run into an issue with calling the function 'read'..
use Data::Dumper;
 
sub read1{
   my $input = shift;
   open(FILE, $input);
   my @names;
   my %seen;
   while (<FILE>) {
        chomp;
        my @lines = map { s/^\s+//; s/\s+//; $_} split( ';', $_ );
        my ($name) = ($_ =~ /^Name:(\w*);/);
        next if (exists $seen{$name});
        $seen{$name} = 1;
        next if /^\s*(?:#|$)/;
        for my $element (@lines) {
               my ($entry,$value) = split( ':', $element);
               $hash{$entry} = $value;
      }
     push(@names, {%hash});
  }
  close(FILE);
  return @names;
}
 
 
my @thing = read1("/tmp/blah");
 
print Dumper(\@thing);

Open in new window

0
 

Author Comment

by:cucugirl
ID: 24389433
where did you declare %hash?
0
 

Author Comment

by:cucugirl
ID: 24406539
hi, for another part of my code i need to push only the last one, and not the first one..
Name:Bill;Location:Miami;Age:27;
Name:Claudette; Location:Detroit;Age:50;
Name:Dave;Location:Florence;Age:25;
Name:Thomas;Location:Miami;Age:27;
Name:Bill;Location:Chicago;Age:47;

i would push
Name:Bill;Location:Chicago;Age:47; rather than
Name:Bill;Location:Miami;Age:27; does anybody know how to do this? With the same routine I had in the beginning? thanks!!!!
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Syslog text file into database or into .CSV 19 137
Extract multiple value with delimiters from a string 4 170
Bash one liner, start and end time calculation 13 110
Perl Frameworks 1 90
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question