Link to home
Start Free TrialLog in
Avatar of NevSoFly
NevSoFly

asked on

Need to speed up slow executing scripts

The attached scripts I have work but they take a long time to process.  They basically filters the records of a log file for a certain string and writes the records that contain that string to a new file.  I know execution is directly affectted by the size of the oringinal log file but Is there anyway to speed this up?  I have been told that some methods of reading and\ or writting to a file are faster than others.  Is the method that I am using the most efficent?  If not how do I improve it?  

The files that I am filtering are between 20 to 80 MB taking up to 50 seconds.
import os, time

from datetime import datetime, timedelta


def SpecErrLog(File, dt, err, Duration):
    source_file = open(File,"r")
    
    try:
        file1 = File + " " + dt.replace(":","_") + " [" +  err + "]" + str(Duration)
        #file1 = "temptry.txt"
        dest_file = open(file1, 'w')
        Mach, Dt = File.split()
        
        BeginDay = datetime.strptime("00:00:00.000", '%H:%M:%S.%f')
        EndDay = datetime.strptime("23:59:59.999", '%H:%M:%S.%f')
        dt = datetime.strptime(dt, '%H:%M:%S.%f')
        
        for line in source_file:
            arr=line.split(",")
            LineTimeStamp = datetime.strptime(arr[4].strip(), '%H:%M:%S.%f') #timeStamp of sourcefile.
            upperLimit= dt + timedelta(minutes=Duration)
            lowerLimit= dt - timedelta(minutes=Duration)
            if lowerLimit > BeginDay and upperLimit < EndDay: #if all records accurr within the same day.
                if lowerLimit < LineTimeStamp < upperLimit:
                    dest_file.write(line)
    finally:
        source_file.close
        dest_file.close
        print "finished"
        
if __name__ == "__main__":

    dt = "09:52:15.710"
    err = "54300"
    SpecErrLog("H108 01-24-2011", dt, err, 30)

Open in new window

Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

Here's a page about Python compilers: http://effbot.org/zone/python-compile.htm  That might speed it up.
ASKER CERTIFIED SOLUTION
Avatar of -Richard-
-Richard-
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
In fact, on second thought, I'm not convinced your logic is exactly correct.  You're throwing out lines where the lower limit might be less than the beginning of the day or the upper limit might be greater than the end of the day.  I think what you want to do is change the lower limit calculation so if the lower limit comes out as less than the start of the day, you make it the start of the day; and make an analagous change with the upper limit and the end of the day.  That would allow you to eliminate the beginDay and endDay comparison entirely, as well as eliminating a bug.  I don't think your way would work properly if the initial "dt" parameter is very close to the beginning or the end of the day.
Avatar of NevSoFly
NevSoFly

ASKER

Richard,

I tried your first suggestions and saved about 7 seconds but I am having a hard time trying to convert the times stamp to seconds.  I get an out of range error.  I have looked on the web but can't find another way to convert a datetime to seconds (integer).

I am trying to use your second suggestion but I think I may have to rethink my approach.  The reason is that the log files that I am pulling this data from only contain info for 1 day.  If the LineTimeStamp is less than BeginDay then I will need to open the previous days logs and search them.

So for now I think I will only work with one day and not test for BeginDay or EndDay.
if you do know of  a way to convert lowerLimit and upperLimit to seconds I'm all ears.
Working with only one day will again improve your efficiency because you will move one more slow conditional check from within to outside the loop.  That should gain you several more seconds.  

Additionally, I missed two more loop invariants!   The calculation of lowerLimit and upperLimit will give the same result every time throuh the loop too.  Those lines can be moved priorto the loop which should gain you even more time.

My suggestion about using seconds was probably my worst idea.  Using seconds will make the comparison faster, but the additional computation involved in doing the conversion might destroy the benefit or even make it worse.  I think you can safely forget about it.

Once you do all the other things we discussed, you'll have a nice tight program and it will be running about as fast as it can.   80 megabytes is not a small fiile and it will take some time under the best of circumstances.   If it gets down to the 30-second range I'd say you were doing pretty good.
thank you