mikeysmailbox1
asked on
Perl speed up looping
Hi
I have a large loop that is taking some time to execute and need to reduce increase the speed.
Main loop {
open file A
split and breakup the variables
sub program {
open file B
split and breakup the variables
if variable from Main loop = variable from sub do another loop
open file C
if variable from current loop = variable from file B do another loop
open file D (same file as A)
if variable from current loop = variable from file c
get data and store
}
close file D
}
close file C
}
close file B
}
sub program {
simular looping as above but file B and file C are switched.
}
close file A
print data
}
The loops are working but the files can be large in size
I changed the foreach loops to while loops but not much change in speed.
Not sure if Greps will speed it up to search in the files instead of looping with while loops
Thanks,
Mike
I have a large loop that is taking some time to execute and need to reduce increase the speed.
Main loop {
open file A
split and breakup the variables
sub program {
open file B
split and breakup the variables
if variable from Main loop = variable from sub do another loop
open file C
if variable from current loop = variable from file B do another loop
open file D (same file as A)
if variable from current loop = variable from file c
get data and store
}
close file D
}
close file C
}
close file B
}
sub program {
simular looping as above but file B and file C are switched.
}
close file A
print data
}
The loops are working but the files can be large in size
I changed the foreach loops to while loops but not much change in speed.
Not sure if Greps will speed it up to search in the files instead of looping with while loops
Thanks,
Mike
If you're opening/reopening files inside a loop as your pseudo implies , then that's one reason it would run slow.
ASKER
Hi FishMonger
Here is the sub I and doing and yes I am opening and closing the file.
Should I use something else?
I originally used arrays but it was not any faster.
sub outconditions {
my @outparms = @_;
open SQLDATAEM_OUTCOND, "/tmp/estee.SQLDATAEM_OUTC OND.txt";
while ($out = <SQLDATAEM_OUTCOND>) {
$out =~ s/^\s+//;
$out =~ s/\s+$//;
($P_OUTTABID,$P_OUTJOBID,$ P_OUTCON,$ P_ODATO,$P _SIGN) = split (/\|/,$out);
if(!$P_OUTTABID) {
} elsif(($outparms[0] eq $P_OUTTABID) && ($outparms[1] eq $P_OUTJOBID)) {
#-------------
# skip header
#-------------
if($P_OUTCON =~ /condition/ and $P_ODATO =~ /odate/ and $P_SIGN =~ /and_or/) {
next;
}
push(@OUTLIST,"$P_OUTCON $P_ODATO $P_SIGN");
open SQLDATAEM_INCOND, "/tmp/estee.SQLDATAEM_INCO ND.txt";
while ($GET_SUCC_INC_LIST = <SQLDATAEM_INCOND>) {
$GET_SUCC_INC_LIST =~ s/^\s+//;
$GET_SUCC_INC_LIST =~ s/\s+$//;
($GET_SUCC_P_INTABID,$GET_ SUCC_P_INJ OBID,$GET_ SUCC_P_INC ON,$GET_SU CC_P_ODATI ,$GET_SUCC _P_SIGNI) = split (/\|/,$GET_SUCC_INC_LIST);
if("${P_OUTCON}" eq "${GET_SUCC_P_INCON}") {
open SQLDATAEM_JOBDATA_OUT, "/tmp/estee.SQLDATAEM_JOBD ATA.txt";
while ($GET_SUCC_GET_JOB = <SQLDATAEM_JOBDATA_OUT>) {
($GET_SUCC_TABLE_ID,$GET_S UCC_JOB_ID ,$GET_SUCC _SCHED_TAB LE,$GET_SU CC_PARENT_ TABLE,$GET _SUCC_APPL ICATION,$G ET_SUCC_GR OUP_NAME,$ GET_SUCC_M EMNAME,$GE T_SUCC_JOB _NAME) = split(/\|/,$GET_SUCC_GET_J OB);
if($GET_SUCC_P_INTABID eq $GET_SUCC_TABLE_ID and $GET_SUCC_P_INJOBID eq $GET_SUCC_JOB_ID and $P_SIGN ne "\-") {
push(@SUCC_LIST,"$GET_SUCC _JOB_NAME" );
} else {
next;
}
}
close(SQLDATAEM_JOBDATA_OU T);
}
}
$GET_SUCC_INC_LIST = ();
$GET_SUCC_TABLE_ID = ();
$GET_SUCC_JOB_ID = ();
$GET_SUCC_SCHED_TABLE = ();
$GET_SUCC_PARENT_TABLE = ();
$GET_SUCC_APPLICATION = ();
$GET_SUCC_GROUP_NAME = ();
$GET_SUCC_MEMNAME = ();
$GET_SUCC_JOB_NAME = ();
$GET_SUCC_P_INTABID = ();
$GET_SUCC_P_INJOBID = ();
$GET_SUCC_P_INCON = ();
$GET_SUCC_P_ODATI = ();
$GET_SUCC_P_SIGNI = ();
$GET_SUCC_GET_JOB = ();
$P_OUTTABID = ();
$P_OUTJOBID = ();
$P_OUTCON = ();
$P_ODATO = ();
$P_SIGN = ();
close(SQLDATAEM_INCOND);
}
}
close(SQLDATAEM_OUTCOND);
}
Here is the sub I and doing and yes I am opening and closing the file.
Should I use something else?
I originally used arrays but it was not any faster.
sub outconditions {
my @outparms = @_;
open SQLDATAEM_OUTCOND, "/tmp/estee.SQLDATAEM_OUTC
while ($out = <SQLDATAEM_OUTCOND>) {
$out =~ s/^\s+//;
$out =~ s/\s+$//;
($P_OUTTABID,$P_OUTJOBID,$
if(!$P_OUTTABID) {
} elsif(($outparms[0] eq $P_OUTTABID) && ($outparms[1] eq $P_OUTJOBID)) {
#-------------
# skip header
#-------------
if($P_OUTCON =~ /condition/ and $P_ODATO =~ /odate/ and $P_SIGN =~ /and_or/) {
next;
}
push(@OUTLIST,"$P_OUTCON $P_ODATO $P_SIGN");
open SQLDATAEM_INCOND, "/tmp/estee.SQLDATAEM_INCO
while ($GET_SUCC_INC_LIST = <SQLDATAEM_INCOND>) {
$GET_SUCC_INC_LIST =~ s/^\s+//;
$GET_SUCC_INC_LIST =~ s/\s+$//;
($GET_SUCC_P_INTABID,$GET_
if("${P_OUTCON}" eq "${GET_SUCC_P_INCON}") {
open SQLDATAEM_JOBDATA_OUT, "/tmp/estee.SQLDATAEM_JOBD
while ($GET_SUCC_GET_JOB = <SQLDATAEM_JOBDATA_OUT>) {
($GET_SUCC_TABLE_ID,$GET_S
if($GET_SUCC_P_INTABID eq $GET_SUCC_TABLE_ID and $GET_SUCC_P_INJOBID eq $GET_SUCC_JOB_ID and $P_SIGN ne "\-") {
push(@SUCC_LIST,"$GET_SUCC
} else {
next;
}
}
close(SQLDATAEM_JOBDATA_OU
}
}
$GET_SUCC_INC_LIST = ();
$GET_SUCC_TABLE_ID = ();
$GET_SUCC_JOB_ID = ();
$GET_SUCC_SCHED_TABLE = ();
$GET_SUCC_PARENT_TABLE = ();
$GET_SUCC_APPLICATION = ();
$GET_SUCC_GROUP_NAME = ();
$GET_SUCC_MEMNAME = ();
$GET_SUCC_JOB_NAME = ();
$GET_SUCC_P_INTABID = ();
$GET_SUCC_P_INJOBID = ();
$GET_SUCC_P_INCON = ();
$GET_SUCC_P_ODATI = ();
$GET_SUCC_P_SIGNI = ();
$GET_SUCC_GET_JOB = ();
$P_OUTTABID = ();
$P_OUTJOBID = ();
$P_OUTCON = ();
$P_ODATO = ();
$P_SIGN = ();
close(SQLDATAEM_INCOND);
}
}
close(SQLDATAEM_OUTCOND);
}
I see lots of problems with that sub. Some definitely are causing your script to be very inefficient and others are poor coding practices that make the script harder to read and maintain.
In the vast majority of cases the slowest part of a program is its disk I/O and you script is reopening and re-parsing the same 3 files over and over and over again. That's very wasteful/inefficient and I'm sure is the main reason why your script is slow.
You haven't given any info about the contents of those files but you do say they can be large, so Lets assume that first file has 1,000 lines (probably a very low estimate) that meet the criteria to reach the point where you open the second file. That file will be opened and re-parsed 1,000 times. If that file has 1,000 lines that meet the criteria to reach the point where you open the third file, then that file is reopened and re-parsed 1,000,000 times. Since this sub is being called from within an outer loop, you need to multiple those numbers by that factor.
How many billions of times do you want to open and parse that 3rd file?
Profiling your script with the Devel::NYTProf module I mentioned will give the real stat numbers.
In the vast majority of cases the slowest part of a program is its disk I/O and you script is reopening and re-parsing the same 3 files over and over and over again. That's very wasteful/inefficient and I'm sure is the main reason why your script is slow.
You haven't given any info about the contents of those files but you do say they can be large, so Lets assume that first file has 1,000 lines (probably a very low estimate) that meet the criteria to reach the point where you open the second file. That file will be opened and re-parsed 1,000 times. If that file has 1,000 lines that meet the criteria to reach the point where you open the third file, then that file is reopened and re-parsed 1,000,000 times. Since this sub is being called from within an outer loop, you need to multiple those numbers by that factor.
How many billions of times do you want to open and parse that 3rd file?
Profiling your script with the Devel::NYTProf module I mentioned will give the real stat numbers.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Have you profiled the script to see where it's spending most of its time?
Devel::NYTProf - Powerful fast feature-rich Perl source code profiler
Are you actually defining the subs inside a loop? If so, that's a mistake.