Matching 2 files based on reference but script is taking too long to process

Posted on 2011-10-12
Last Modified: 2012-06-21
Hi all,

I made a perl script to match two files based on the reference id. But it is taking too long to process. I have attached a snippet of the code.

Basically what I am trying to achieve is as per below:

File 1

col 1 | col 2 | col 3 | col 4 | col 5 | col 6| col 7 | col..
1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164

File 2

Col 1 | Col 2 | Col 3 | Col 4
87922|87922| |70


#append column 4 of file 2 to file 1, if column 1 of file 2 match column 5 of file 1

1BSH0008655|CEL0001|1BSH0006566|PP|17-MAY-2010|87921AM| -87921AM|00DC3D|97.62|20|17-MAY-2010|SPPNP3164|75
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.98|2|29-MAR-2010|SPPNP3164|70
1BSH0008655|OPT0003|1BSH0006570|PP|29-MAR-2010|87922|GR Whse:RG04|00CWQB|-9344.99|2|29-MAR-2010|SPPNP3164|70

Can I please have some advice on how the code can be optimized ?  I'm really new to perl and so I can't think of a more efficient way to go about writing the code. I assigned the additional  integer to the hash as I also need to display duplicates in file 1. There are about 78000 lines to process and it's taking forever. Would appreciate some help. Thank you


my $file1= '/imports/1stoutput.txt';
	  open(FILE1,"<$file1") or die "$file1 $!";
	  my @temparray=<FILE1>;
	  my $ii=1;
	  my %so;
	  for (@temparray){
		my @col = split /\|/;
	  my $file2= '/imports/potest.txt';
	  open(FILE2,"<$file2") or die "$file2 $!";
	  my @arr1=<FILE2>;
	 for (@arr1){
		$col1b =~ s/^\s+//;
		$col1b="" if(!$col1b);

  foreach my $id (sort keys %so) {
   foreach my $name (keys %{$so{$id}}) { #for each price(id)
		if( $so{$col1b}{$name} ) {
		print OUTPUT "$so{$col1b}{$name}|$col4b\n";
close OUTPUT;

Open in new window

Question by:Jason_Sutiono
    LVL 16

    Expert Comment

    Is this what you are looking for:

    my $file1= '1stoutput.txt';
    open(FILE1,"<$file1") or die "$file1 $!";
    while ( <FILE1> )
    	my @col = split /\|/;
    	my $key = $col[5];
    	$so{$key} = $_;
    close FILE1;
    my $file2= 'potest.txt';
    open(FILE2,"<$file2") or die "$file2 $!";
    while ( <FILE2> )
    	my @col = split /\|/;
    	my $key = $col[0];
    	if ( $so{$key} ne '' )
    		print join('|', $so{$key}, $col[3]) . "\n";
    close FILE2;

    Open in new window

    LVL 28

    Assisted Solution

    use strict;
    use warnings;
    my (%so, $fh);
    my $file1 = '/imports/1stoutput.txt';
    open $fh, '<', $file1 or die "can't open '$file1' $!";
    while ( my $line = <$fh> ) {
        chomp $line;
        my $key = (split /\|/, $line)[5];
        push @{ $so{$key} }, $line;
    close $fh;
    my $file2 = '/imports/potest.txt';
    open $fh, '<', $file2 or die "can't open '$file2' $!";
    while ( my $line = <$fh> ) {
        chomp $line;
        my ($key, $num) = (split /\|/, $line)[0,3];
        next unless exists $so{$key};
        foreach ( @{ $so{$key} } ) {
            $_ .= "|$num\n";
    close $fh;
    my $file3 = '/imports/final.txt';
    open $fh, '>', $file3 or die "can't open '$file3' $!";
    foreach my $key ( sort keys %so ) {
        print {$fh} $_ for @{ $so{$key} };

    Open in new window

    LVL 9

    Accepted Solution

    This should be faster,
    note that the loop is first done on the second file, and the lines on the first file are appended on the fly.
    Also the print is done in one go which is faster than printing each line.

    use strict;
    use warnings;
    my $f1 = shift || die "Usage: $0 file1 file2";
    my $f2 = shift || die "Usage: $0 file1 file2";
    open(F2, $f2) or die "Can't open $f2: $!\n";
    my %ids;
    while (<F2>) {
    	my @arr = (split(/\|/))[0,3];
    	$ids{$arr[0]} = $arr[1];
    close F2;
    my $output = '';
    open(F1, $f1) or die "Can't open $f1: $!\n";
    while (<F1>) {
    	my $id = (split(/\|/))[5];
    	$_ .= "|$ids{$id}" if exists $ids{$id};
    	$output .= $_;
    close F1;
    print $output;

    Open in new window


    Author Closing Comment

    Thanks guys. Sorry for the late response

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
    A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (,  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    Here's a very brief overview of the methods PRTG Network Monitor ( offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

    761 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    15 Experts available now in Live!

    Get 1:1 Help Now