Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 157
  • Last Modified:

Perl speed up looping


I have a large loop that is taking some time to execute and need to reduce increase the speed.

Main loop {
 open file A
 split and breakup the variables
 sub program {
     open file B
       split and breakup the variables
       if variable from Main loop = variable from sub do  another loop
           open file C
               if variable from current loop = variable from file B  do another loop
                   open file D (same file as A)
                     if variable from current loop = variable from file c
                         get data and store
                   close file D
           close file C
     close file B
  sub program {
    simular looping as above but file B and file C are switched.
  close file A    
  print data

The loops are working but the files can be large in size

I changed the foreach loops to while loops but not much change in speed.

Not sure if Greps will speed it up to search in the files instead of looping with while loops


  • 4
1 Solution
We need to see the actual code.

Have you profiled the script to see where it's spending most of its time?

Devel::NYTProf - Powerful fast feature-rich Perl source code profiler

Are you actually defining the subs inside a loop?  If so, that's a mistake.
If you're opening/reopening files inside a loop as your pseudo implies , then that's one reason it would run slow.
mikeysmailbox1Author Commented:
Hi FishMonger

Here is the sub  I and doing and yes I am opening and closing the file.
Should I use something else?
I originally used arrays but it was not any faster.

sub outconditions {

  my @outparms = @_;
  while ($out = <SQLDATAEM_OUTCOND>) {
    $out =~ s/^\s+//;
    $out =~ s/\s+$//;
    ($P_OUTTABID,$P_OUTJOBID,$P_OUTCON,$P_ODATO,$P_SIGN) = split (/\|/,$out);
    if(!$P_OUTTABID) {
    } elsif(($outparms[0] eq $P_OUTTABID) && ($outparms[1] eq $P_OUTJOBID)) {
      # skip header
      if($P_OUTCON =~ /condition/ and $P_ODATO =~ /odate/ and $P_SIGN =~ /and_or/) {
      push(@OUTLIST,"$P_OUTCON   $P_ODATO   $P_SIGN");
      open SQLDATAEM_INCOND, "/tmp/estee.SQLDATAEM_INCOND.txt";
              $GET_SUCC_INC_LIST =~ s/^\s+//;
          $GET_SUCC_INC_LIST =~ s/\s+$//;
           if("${P_OUTCON}" eq "${GET_SUCC_P_INCON}") {
             open SQLDATAEM_JOBDATA_OUT, "/tmp/estee.SQLDATAEM_JOBDATA.txt";
             while ($GET_SUCC_GET_JOB = <SQLDATAEM_JOBDATA_OUT>) {
                   if($GET_SUCC_P_INTABID eq $GET_SUCC_TABLE_ID and $GET_SUCC_P_INJOBID eq $GET_SUCC_JOB_ID and $P_SIGN ne "\-") {
                   } else {
           $GET_SUCC_INC_LIST = ();
           $GET_SUCC_TABLE_ID = ();
           $GET_SUCC_JOB_ID = ();
           $GET_SUCC_SCHED_TABLE = ();
           $GET_SUCC_PARENT_TABLE = ();
           $GET_SUCC_APPLICATION = ();
           $GET_SUCC_GROUP_NAME = ();
           $GET_SUCC_MEMNAME = ();
           $GET_SUCC_JOB_NAME = ();
           $GET_SUCC_P_INTABID = ();
           $GET_SUCC_P_INJOBID = ();
           $GET_SUCC_P_INCON = ();
           $GET_SUCC_P_ODATI = ();
           $GET_SUCC_P_SIGNI = ();
           $GET_SUCC_GET_JOB = ();
       $P_OUTTABID = ();
       $P_OUTJOBID = ();
       $P_OUTCON = ();
       $P_ODATO = ();
       $P_SIGN = ();
I see lots of problems with that sub.  Some definitely are causing your script to be very inefficient and others are poor coding practices that make the script harder to read and maintain.

In the vast majority of cases the slowest part of a program is its disk I/O and you script is reopening and re-parsing the same 3 files over and over and over again.  That's very wasteful/inefficient and I'm sure is the main reason why your script is slow.

You haven't given any info about the contents of those files but you do say they can be large, so Lets assume that first file has 1,000 lines (probably a very low estimate) that meet the criteria to reach the point where you open the second file.  That file will be opened and re-parsed 1,000 times.  If that file has 1,000 lines that  meet the criteria to reach the point where you open the third file, then that file is reopened and re-parsed 1,000,000 times.  Since this sub is being called from within an outer loop, you need to multiple those numbers by that factor.

 How many billions of times do you want to open and parse that 3rd file?

Profiling your script with the Devel::NYTProf module I mentioned will give the real stat numbers.
Since I don't have any clue about to the contents of those files, I can't say if what I'm about to suggest will help much, but it might.

You could easily and efficiently load those files into a database (temporary or not) and use sql statements to filter/extract different combinations of data based on your needed criteria.  Doing that should/will be far more efficient than the nested looping and parsing.

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now