Perl speed up looping

Posted on 2014-08-28
Last Modified: 2014-12-23

I have a large loop that is taking some time to execute and need to reduce increase the speed.

Main loop {
 open file A
 split and breakup the variables
 sub program {
     open file B
       split and breakup the variables
       if variable from Main loop = variable from sub do  another loop
           open file C
               if variable from current loop = variable from file B  do another loop
                   open file D (same file as A)
                     if variable from current loop = variable from file c
                         get data and store
                   close file D
           close file C
     close file B
  sub program {
    simular looping as above but file B and file C are switched.
  close file A    
  print data

The loops are working but the files can be large in size

I changed the foreach loops to while loops but not much change in speed.

Not sure if Greps will speed it up to search in the files instead of looping with while loops


Question by:mikeysmailbox1
    LVL 28

    Expert Comment

    We need to see the actual code.

    Have you profiled the script to see where it's spending most of its time?

    Devel::NYTProf - Powerful fast feature-rich Perl source code profiler

    Are you actually defining the subs inside a loop?  If so, that's a mistake.
    LVL 28

    Expert Comment

    If you're opening/reopening files inside a loop as your pseudo implies , then that's one reason it would run slow.

    Author Comment

    Hi FishMonger

    Here is the sub  I and doing and yes I am opening and closing the file.
    Should I use something else?
    I originally used arrays but it was not any faster.

    sub outconditions {

      my @outparms = @_;
      open SQLDATAEM_OUTCOND, "/tmp/estee.SQLDATAEM_OUTCOND.txt";
      while ($out = <SQLDATAEM_OUTCOND>) {
        $out =~ s/^\s+//;
        $out =~ s/\s+$//;
        ($P_OUTTABID,$P_OUTJOBID,$P_OUTCON,$P_ODATO,$P_SIGN) = split (/\|/,$out);
        if(!$P_OUTTABID) {
        } elsif(($outparms[0] eq $P_OUTTABID) && ($outparms[1] eq $P_OUTJOBID)) {
          # skip header
          if($P_OUTCON =~ /condition/ and $P_ODATO =~ /odate/ and $P_SIGN =~ /and_or/) {
          push(@OUTLIST,"$P_OUTCON   $P_ODATO   $P_SIGN");
          open SQLDATAEM_INCOND, "/tmp/estee.SQLDATAEM_INCOND.txt";
          while ($GET_SUCC_INC_LIST = <SQLDATAEM_INCOND>) {
                  $GET_SUCC_INC_LIST =~ s/^\s+//;
              $GET_SUCC_INC_LIST =~ s/\s+$//;
               if("${P_OUTCON}" eq "${GET_SUCC_P_INCON}") {
                 open SQLDATAEM_JOBDATA_OUT, "/tmp/estee.SQLDATAEM_JOBDATA.txt";
                 while ($GET_SUCC_GET_JOB = <SQLDATAEM_JOBDATA_OUT>) {
                       if($GET_SUCC_P_INTABID eq $GET_SUCC_TABLE_ID and $GET_SUCC_P_INJOBID eq $GET_SUCC_JOB_ID and $P_SIGN ne "\-") {
                       } else {
               $GET_SUCC_INC_LIST = ();
               $GET_SUCC_TABLE_ID = ();
               $GET_SUCC_JOB_ID = ();
               $GET_SUCC_SCHED_TABLE = ();
               $GET_SUCC_PARENT_TABLE = ();
               $GET_SUCC_APPLICATION = ();
               $GET_SUCC_GROUP_NAME = ();
               $GET_SUCC_MEMNAME = ();
               $GET_SUCC_JOB_NAME = ();
               $GET_SUCC_P_INTABID = ();
               $GET_SUCC_P_INJOBID = ();
               $GET_SUCC_P_INCON = ();
               $GET_SUCC_P_ODATI = ();
               $GET_SUCC_P_SIGNI = ();
               $GET_SUCC_GET_JOB = ();
           $P_OUTTABID = ();
           $P_OUTJOBID = ();
           $P_OUTCON = ();
           $P_ODATO = ();
           $P_SIGN = ();
    LVL 28

    Expert Comment

    I see lots of problems with that sub.  Some definitely are causing your script to be very inefficient and others are poor coding practices that make the script harder to read and maintain.

    In the vast majority of cases the slowest part of a program is its disk I/O and you script is reopening and re-parsing the same 3 files over and over and over again.  That's very wasteful/inefficient and I'm sure is the main reason why your script is slow.

    You haven't given any info about the contents of those files but you do say they can be large, so Lets assume that first file has 1,000 lines (probably a very low estimate) that meet the criteria to reach the point where you open the second file.  That file will be opened and re-parsed 1,000 times.  If that file has 1,000 lines that  meet the criteria to reach the point where you open the third file, then that file is reopened and re-parsed 1,000,000 times.  Since this sub is being called from within an outer loop, you need to multiple those numbers by that factor.

     How many billions of times do you want to open and parse that 3rd file?

    Profiling your script with the Devel::NYTProf module I mentioned will give the real stat numbers.
    LVL 28

    Accepted Solution

    Since I don't have any clue about to the contents of those files, I can't say if what I'm about to suggest will help much, but it might.

    You could easily and efficiently load those files into a database (temporary or not) and use sql statements to filter/extract different combinations of data based on your needed criteria.  Doing that should/will be far more efficient than the nested looping and parsing.

    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    Sending email via Perl on Windows 3 107
    perl script help 12 99
    stftime format 4 47
    Vb script to unzip a files and rename the files 5 33
    I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
    There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    Migrating to Microsoft Office 365 is becoming increasingly popular for organizations both large and small. If you have made the leap to Microsoft’s cloud platform, you know that you will need to create a corporate email signature for your Office 365…

    755 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    21 Experts available now in Live!

    Get 1:1 Help Now