asked on

Perl speed up looping

Hi

I have a large loop that is taking some time to execute and need to reduce increase the speed.

Main loop {

open file A

split and breakup the variables

sub program {

open file B
split and breakup the variables
if variable from Main loop = variable from sub do another loop
open file C
if variable from current loop = variable from file B do another loop
open file D (same file as A)
if variable from current loop = variable from file c
get data and store
}
close file D
}
close file C

}
close file B

}

sub program {

simular looping as above but file B and file C are switched.

}

close file A

print data

}

The loops are working but the files can be large in size

I changed the foreach loops to while loops but not much change in speed.

Not sure if Greps will speed it up to search in the files instead of looping with while loops

Thanks,

Mike

FishMonger

We need to see the actual code.

Have you profiled the script to see where it's spending most of its time?

Devel::NYTProf - Powerful fast feature-rich Perl source code profiler

Are you actually defining the subs inside a loop? If so, that's a mistake.

FishMonger

If you're opening/reopening files inside a loop as your pseudo implies , then that's one reason it would run slow.

mikeysmailbox1

ASKER

Hi FishMonger

Here is the sub I and doing and yes I am opening and closing the file.
Should I use something else?
I originally used arrays but it was not any faster.

sub outconditions {

my @outparms = @_;

open SQLDATAEM_OUTCOND, "/tmp/estee.SQLDATAEM_OUTCOND.txt";
while ($out = <SQLDATAEM_OUTCOND>) {

$out =~ s/^\s+//;
$out =~ s/\s+$//;

($P_OUTTABID,$P_OUTJOBID,$P_OUTCON,$P_ODATO,$P_SIGN) = split (/\|/,$out);

if(!$P_OUTTABID) {
} elsif(($outparms[0] eq $P_OUTTABID) && ($outparms[1] eq $P_OUTJOBID)) {

#-------------
# skip header
#-------------
if($P_OUTCON =~ /condition/ and $P_ODATO =~ /odate/ and $P_SIGN =~ /and_or/) {
next;
}

push(@OUTLIST,"$P_OUTCON $P_ODATO $P_SIGN");

open SQLDATAEM_INCOND, "/tmp/estee.SQLDATAEM_INCOND.txt";
while ($GET_SUCC_INC_LIST = <SQLDATAEM_INCOND>) {

       $GET_SUCC_INC_LIST =~ s/^\s+//;
$GET_SUCC_INC_LIST =~ s/\s+$//;

($GET_SUCC_P_INTABID,$GET_SUCC_P_INJOBID,$GET_SUCC_P_INCON,$GET_SUCC_P_ODATI,$GET_SUCC_P_SIGNI) = split (/\|/,$GET_SUCC_INC_LIST);

if("${P_OUTCON}" eq "${GET_SUCC_P_INCON}") {

open SQLDATAEM_JOBDATA_OUT, "/tmp/estee.SQLDATAEM_JOBDATA.txt";
while ($GET_SUCC_GET_JOB = <SQLDATAEM_JOBDATA_OUT>) {

($GET_SUCC_TABLE_ID,$GET_SUCC_JOB_ID,$GET_SUCC_SCHED_TABLE,$GET_SUCC_PARENT_TABLE,$GET_SUCC_APPLICATION,$GET_SUCC_GROUP_NAME,$GET_SUCC_MEMNAME,$GET_SUCC_JOB_NAME) = split(/\|/,$GET_SUCC_GET_JOB);

       if($GET_SUCC_P_INTABID eq $GET_SUCC_TABLE_ID and $GET_SUCC_P_INJOBID eq $GET_SUCC_JOB_ID and $P_SIGN ne "\-") {

       push(@SUCC_LIST,"$GET_SUCC_JOB_NAME");

       } else {
       next;
       }

       }

       close(SQLDATAEM_JOBDATA_OUT);
       }

       }

       $GET_SUCC_INC_LIST = ();
       $GET_SUCC_TABLE_ID = ();
       $GET_SUCC_JOB_ID = ();
       $GET_SUCC_SCHED_TABLE = ();
       $GET_SUCC_PARENT_TABLE = ();
       $GET_SUCC_APPLICATION = ();
       $GET_SUCC_GROUP_NAME = ();
       $GET_SUCC_MEMNAME = ();
       $GET_SUCC_JOB_NAME = ();
       $GET_SUCC_P_INTABID = ();
       $GET_SUCC_P_INJOBID = ();
       $GET_SUCC_P_INCON = ();
       $GET_SUCC_P_ODATI = ();
       $GET_SUCC_P_SIGNI = ();
       $GET_SUCC_GET_JOB = ();

$P_OUTTABID = ();
$P_OUTJOBID = ();
$P_OUTCON = ();
$P_ODATO = ();
$P_SIGN = ();

close(SQLDATAEM_INCOND);
}
}
close(SQLDATAEM_OUTCOND);
}

FishMonger

I see lots of problems with that sub. Some definitely are causing your script to be very inefficient and others are poor coding practices that make the script harder to read and maintain.

In the vast majority of cases the slowest part of a program is its disk I/O and you script is reopening and re-parsing the same 3 files over and over and over again. That's very wasteful/inefficient and I'm sure is the main reason why your script is slow.

You haven't given any info about the contents of those files but you do say they can be large, so Lets assume that first file has 1,000 lines (probably a very low estimate) that meet the criteria to reach the point where you open the second file. That file will be opened and re-parsed 1,000 times. If that file has 1,000 lines that meet the criteria to reach the point where you open the third file, then that file is reopened and re-parsed 1,000,000 times. Since this sub is being called from within an outer loop, you need to multiple those numbers by that factor.

How many billions of times do you want to open and parse that 3rd file?

Profiling your script with the Devel::NYTProf module I mentioned will give the real stat numbers.

ASKER CERTIFIED SOLUTION

FishMonger

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial