Matching 2 files based on reference but script is taking too long to process

Posted on 2011-10-12
Medium Priority
Last Modified: 2012-06-21
Hi all,

I made a perl script to match two files based on the reference id. But it is taking too long to process. I have attached a snippet of the code.

Basically what I am trying to achieve is as per below:

File 1

col 1 | col 2 | col 3 | col 4 | col 5 | col 6| col 7 | col..
1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164

File 2

Col 1 | Col 2 | Col 3 | Col 4
87922|87922| |70


#append column 4 of file 2 to file 1, if column 1 of file 2 match column 5 of file 1

1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164|75
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164|70
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164|70

Can I please have some advice on how the code can be optimized ?  I'm really new to perl and so I can't think of a more efficient way to go about writing the code. I assigned the additional  integer to the hash as I also need to display duplicates in file 1. There are about 78000 lines to process and it's taking forever. Would appreciate some help. Thank you


my $file1= '/imports/1stoutput.txt';
	  open(FILE1,"<$file1") or die "$file1 $!";
	  my @temparray=<FILE1>;
	  my $ii=1;
	  my %so;
	  for (@temparray){
		my @col = split /\|/;
	  my $file2= '/imports/potest.txt';
	  open(FILE2,"<$file2") or die "$file2 $!";
	  my @arr1=<FILE2>;
	 for (@arr1){
		$col1b =~ s/^\s+//;
		$col1b="" if(!$col1b);

  foreach my $id (sort keys %so) {
   foreach my $name (keys %{$so{$id}}) { #for each price(id)
		if( $so{$col1b}{$name} ) {
		print OUTPUT "$so{$col1b}{$name}|$col4b\n";
close OUTPUT;

Open in new window

Question by:Jason_Sutiono
LVL 16

Expert Comment

ID: 36955047
Is this what you are looking for:

my $file1= '1stoutput.txt';
open(FILE1,"<$file1") or die "$file1 $!";
while ( <FILE1> )
	my @col = split /\|/;
	my $key = $col[5];
	$so{$key} = $_;
close FILE1;

my $file2= 'potest.txt';
open(FILE2,"<$file2") or die "$file2 $!";
while ( <FILE2> )
	my @col = split /\|/;
	my $key = $col[0];
	if ( $so{$key} ne '' )
		print join('|', $so{$key}, $col[3]) . "\n";
close FILE2;

Open in new window

LVL 28

Assisted Solution

FishMonger earned 248 total points
ID: 36956608

use strict;
use warnings;

my (%so, $fh);

my $file1 = '/imports/1stoutput.txt';
open $fh, '<', $file1 or die "can't open '$file1' $!";

while ( my $line = <$fh> ) {
    chomp $line;
    my $key = (split /\|/, $line)[5];
    push @{ $so{$key} }, $line;
close $fh;

my $file2 = '/imports/potest.txt';
open $fh, '<', $file2 or die "can't open '$file2' $!";

while ( my $line = <$fh> ) {
    chomp $line;
    my ($key, $num) = (split /\|/, $line)[0,3];
    next unless exists $so{$key};
    foreach ( @{ $so{$key} } ) {
        $_ .= "|$num\n";
close $fh;

my $file3 = '/imports/final.txt';
open $fh, '>', $file3 or die "can't open '$file3' $!";

foreach my $key ( sort keys %so ) {
    print {$fh} $_ for @{ $so{$key} };

Open in new window


Accepted Solution

parparov earned 252 total points
ID: 36959008
This should be faster,
note that the loop is first done on the second file, and the lines on the first file are appended on the fly.
Also the print is done in one go which is faster than printing each line.


use strict;
use warnings;

my $f1 = shift || die "Usage: $0 file1 file2";
my $f2 = shift || die "Usage: $0 file1 file2";

open(F2, $f2) or die "Can't open $f2: $!\n";
my %ids;
while (<F2>) {
	my @arr = (split(/\|/))[0,3];
	$ids{$arr[0]} = $arr[1];
close F2;
my $output = '';
open(F1, $f1) or die "Can't open $f1: $!\n";
while (<F1>) {
	my $id = (split(/\|/))[5];
	$_ .= "|$ids{$id}" if exists $ids{$id};
	$output .= $_;
close F1;
print $output;

Open in new window


Author Closing Comment

ID: 36966871
Thanks guys. Sorry for the late response

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question