Matching 2 files based on reference but script is taking too long to process

Hi all,

I made a perl script to match two files based on the reference id. But it is taking too long to process. I have attached a snippet of the code.

Basically what I am trying to achieve is as per below:

File 1

col 1 | col 2 | col 3 | col 4 | col 5 | col 6| col 7 | col..
1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164

File 2

Col 1 | Col 2 | Col 3 | Col 4
87921AM|87921|AM|75
87921AN|87921|AN|90
87922|87922| |70

Output:

#append column 4 of file 2 to file 1, if column 1 of file 2 match column 5 of file 1

1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164|75
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164|70
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164|70

Can I please have some advice on how the code can be optimized ?  I'm really new to perl and so I can't think of a more efficient way to go about writing the code. I assigned the additional  integer to the hash as I also need to display duplicates in file 1. There are about 78000 lines to process and it's taking forever. Would appreciate some help. Thank you

Jason

my $file1= '/imports/1stoutput.txt';
	  open(FILE1,"<$file1") or die "$file1 $!";
	  my @temparray=<FILE1>;
	  
	  
	  
	  my $ii=1;
	  my %so;
	  
	  for (@temparray){
		chomp;
		my @col = split /\|/;
		
		$so{$col[5]}{$ii}="$col[0]|$col[1]|$col[2]|$col[3]|$col[4]|$col[5]|$col[6]|$col[7]|$col[8]|$col[9]|$col[10]";
	
		$ii++;
		
}
	  
	  
	  
	  my $file2= '/imports/potest.txt';
	  open(FILE2,"<$file2") or die "$file2 $!";
	  my @arr1=<FILE2>;
	  
	 for (@arr1){
		chomp;
		my($col1b,$col2b,$col3b,$col4b)=split(/\|/);
		$col1b =~ s/^\s+//;
		$col1b="" if(!$col1b);

  foreach my $id (sort keys %so) {
   
   foreach my $name (keys %{$so{$id}}) { #for each price(id)
		if( $so{$col1b}{$name} ) {
		
	
		print OUTPUT "$so{$col1b}{$name}|$col4b\n";
		
		
		
		
		}
	 
	 }
	 }
}
close OUTPUT;

Open in new window

Jason_SutionoAsked:
Who is Participating?
 
parparovCommented:
This should be faster,
note that the loop is first done on the second file, and the lines on the first file are appended on the fly.
Also the print is done in one go which is faster than printing each line.

#!/usr/bin/perl

use strict;
use warnings;

my $f1 = shift || die "Usage: $0 file1 file2";
my $f2 = shift || die "Usage: $0 file1 file2";

open(F2, $f2) or die "Can't open $f2: $!\n";
my %ids;
while (<F2>) {
	my @arr = (split(/\|/))[0,3];
	$ids{$arr[0]} = $arr[1];
}
close F2;
my $output = '';
open(F1, $f1) or die "Can't open $f1: $!\n";
while (<F1>) {
	chomp;
	my $id = (split(/\|/))[5];
	$_ .= "|$ids{$id}" if exists $ids{$id};
	$output .= $_;
}
close F1;
print $output;

Open in new window

0
 
sjklein42Commented:
Is this what you are looking for:

my $file1= '1stoutput.txt';
open(FILE1,"<$file1") or die "$file1 $!";
while ( <FILE1> )
{
	chomp;
	my @col = split /\|/;
	my $key = $col[5];
	$so{$key} = $_;
}
close FILE1;


my $file2= 'potest.txt';
open(FILE2,"<$file2") or die "$file2 $!";
while ( <FILE2> )
{
	chomp;
	my @col = split /\|/;
	my $key = $col[0];
	if ( $so{$key} ne '' )
	{
		print join('|', $so{$key}, $col[3]) . "\n";
	}
}
close FILE2;

Open in new window

0
 
FishMongerCommented:
#!/usr/bin/perl

use strict;
use warnings;

my (%so, $fh);

my $file1 = '/imports/1stoutput.txt';
open $fh, '<', $file1 or die "can't open '$file1' $!";

while ( my $line = <$fh> ) {
    chomp $line;
    my $key = (split /\|/, $line)[5];
    push @{ $so{$key} }, $line;
}
close $fh;

my $file2 = '/imports/potest.txt';
open $fh, '<', $file2 or die "can't open '$file2' $!";

while ( my $line = <$fh> ) {
    chomp $line;
    my ($key, $num) = (split /\|/, $line)[0,3];
    next unless exists $so{$key};
    
    foreach ( @{ $so{$key} } ) {
        $_ .= "|$num\n";
    }
}
close $fh;

my $file3 = '/imports/final.txt';
open $fh, '>', $file3 or die "can't open '$file3' $!";

foreach my $key ( sort keys %so ) {
    print {$fh} $_ for @{ $so{$key} };
}

Open in new window

0
 
Jason_SutionoAuthor Commented:
Thanks guys. Sorry for the late response
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.