?
Solved

Matching 2 files based on reference but script is taking too long to process

Posted on 2011-10-12
4
Medium Priority
?
195 Views
Last Modified: 2012-06-21
Hi all,

I made a perl script to match two files based on the reference id. But it is taking too long to process. I have attached a snippet of the code.

Basically what I am trying to achieve is as per below:

File 1

col 1 | col 2 | col 3 | col 4 | col 5 | col 6| col 7 | col..
1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164

File 2

Col 1 | Col 2 | Col 3 | Col 4
87921AM|87921|AM|75
87921AN|87921|AN|90
87922|87922| |70

Output:

#append column 4 of file 2 to file 1, if column 1 of file 2 match column 5 of file 1

1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164|75
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164|70
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164|70

Can I please have some advice on how the code can be optimized ?  I'm really new to perl and so I can't think of a more efficient way to go about writing the code. I assigned the additional  integer to the hash as I also need to display duplicates in file 1. There are about 78000 lines to process and it's taking forever. Would appreciate some help. Thank you

Jason

my $file1= '/imports/1stoutput.txt';
	  open(FILE1,"<$file1") or die "$file1 $!";
	  my @temparray=<FILE1>;
	  
	  
	  
	  my $ii=1;
	  my %so;
	  
	  for (@temparray){
		chomp;
		my @col = split /\|/;
		
		$so{$col[5]}{$ii}="$col[0]|$col[1]|$col[2]|$col[3]|$col[4]|$col[5]|$col[6]|$col[7]|$col[8]|$col[9]|$col[10]";
	
		$ii++;
		
}
	  
	  
	  
	  my $file2= '/imports/potest.txt';
	  open(FILE2,"<$file2") or die "$file2 $!";
	  my @arr1=<FILE2>;
	  
	 for (@arr1){
		chomp;
		my($col1b,$col2b,$col3b,$col4b)=split(/\|/);
		$col1b =~ s/^\s+//;
		$col1b="" if(!$col1b);

  foreach my $id (sort keys %so) {
   
   foreach my $name (keys %{$so{$id}}) { #for each price(id)
		if( $so{$col1b}{$name} ) {
		
	
		print OUTPUT "$so{$col1b}{$name}|$col4b\n";
		
		
		
		
		}
	 
	 }
	 }
}
close OUTPUT;

Open in new window

0
Comment
Question by:Jason_Sutiono
4 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 36955047
Is this what you are looking for:

my $file1= '1stoutput.txt';
open(FILE1,"<$file1") or die "$file1 $!";
while ( <FILE1> )
{
	chomp;
	my @col = split /\|/;
	my $key = $col[5];
	$so{$key} = $_;
}
close FILE1;


my $file2= 'potest.txt';
open(FILE2,"<$file2") or die "$file2 $!";
while ( <FILE2> )
{
	chomp;
	my @col = split /\|/;
	my $key = $col[0];
	if ( $so{$key} ne '' )
	{
		print join('|', $so{$key}, $col[3]) . "\n";
	}
}
close FILE2;

Open in new window

0
 
LVL 28

Assisted Solution

by:FishMonger
FishMonger earned 248 total points
ID: 36956608
#!/usr/bin/perl

use strict;
use warnings;

my (%so, $fh);

my $file1 = '/imports/1stoutput.txt';
open $fh, '<', $file1 or die "can't open '$file1' $!";

while ( my $line = <$fh> ) {
    chomp $line;
    my $key = (split /\|/, $line)[5];
    push @{ $so{$key} }, $line;
}
close $fh;

my $file2 = '/imports/potest.txt';
open $fh, '<', $file2 or die "can't open '$file2' $!";

while ( my $line = <$fh> ) {
    chomp $line;
    my ($key, $num) = (split /\|/, $line)[0,3];
    next unless exists $so{$key};
    
    foreach ( @{ $so{$key} } ) {
        $_ .= "|$num\n";
    }
}
close $fh;

my $file3 = '/imports/final.txt';
open $fh, '>', $file3 or die "can't open '$file3' $!";

foreach my $key ( sort keys %so ) {
    print {$fh} $_ for @{ $so{$key} };
}

Open in new window

0
 
LVL 9

Accepted Solution

by:
parparov earned 252 total points
ID: 36959008
This should be faster,
note that the loop is first done on the second file, and the lines on the first file are appended on the fly.
Also the print is done in one go which is faster than printing each line.

#!/usr/bin/perl

use strict;
use warnings;

my $f1 = shift || die "Usage: $0 file1 file2";
my $f2 = shift || die "Usage: $0 file1 file2";

open(F2, $f2) or die "Can't open $f2: $!\n";
my %ids;
while (<F2>) {
	my @arr = (split(/\|/))[0,3];
	$ids{$arr[0]} = $arr[1];
}
close F2;
my $output = '';
open(F1, $f1) or die "Can't open $f1: $!\n";
while (<F1>) {
	chomp;
	my $id = (split(/\|/))[5];
	$_ .= "|$ids{$id}" if exists $ids{$id};
	$output .= $_;
}
close F1;
print $output;

Open in new window

0
 

Author Closing Comment

by:Jason_Sutiono
ID: 36966871
Thanks guys. Sorry for the late response
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question