• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 451
  • Last Modified:

Python Script That Scans a Folder , Changes Text In File and Rewrites

Hi ,
Im having some problems writing a python script that does a few easy things

Scans Folders for files , it needs to know ( i do not know how :) ) that the file is not written to currently (meaning its a done file)

it then looks for a date pattern that looks like i.e :
28/04/11 10:33:26
and changes the following to
04/28/11 10:33:26

dd/mm/yy -> mm/dd/yy

after that is done , the file is deleted and written to another directory ,
Any code snippets would do :)

Thanks
0
m0tek
Asked:
m0tek
1 Solution
 
käµfm³d 👽Commented:
My initial research into the "the file is not written to currently" aspect of your question seems to indicate that this is rather difficult (or unreliable) to do. A true python guru may be able to inform you otherwise.

As for the regular expression to change the date format, the following should be adequate:
import re

dates = re.compile("(\d\d?/)(\d\d?/)(\d\d \d\d?:\d\d?:\d\d?)")
dates.sub("\\2\\1\\3", "28/04/11 10:33:26")

Open in new window

0
 
peprCommented:
To add to what kaufmed suggested.  Apparently, there are at least two kinds of solution concerning switching the day and month: pure text-based operations (for example using the regular expressions -- as shown by kaufmed), or a solution based on conversion of the text to the datetime representation and its convertsion back to a string using a different formatting template.

Concerning the regular expression, \d means one digit, \d? means one or no digit.  In this case, it is possible to replace the \d\d? by \d+ which is not the same, but it would also work (meaning one or more digits).  It is probably a good idea to use the raw-string prefix for all the re patterns, even though it is not always neccessary. At least, you can avoid doubling the backslash which is used very often in the regex patterns.

See the alternative \d+ to kaufmed's \d\d?  (but it is only the alternative -- the kaufmed solution is perfect):

import re

dates = re.compile(r"(\d+/)(\d+/)(\d\d \d+:\d+:\d+)")
print dates.sub(r"\2\1\3", "28/04/11 10:33:26")

Open in new window


Concerning the detection of a file that is not written to... Actually, you want to open the file for write access exclusively.  It depend on OS that you use (UNIX-based vs. Windows).  Have a look at the ActiveState Python Recipe http://code.activestate.com/recipes/65203/ that shows how to lock the opened file. (I suggest to use only the wanted fragment of the recipe.)
0
 
peprCommented:
Do you want to fix the content of the lines inside the files?  Are the files a kind of log files?  Can you attach few lines of such file?

Why do you want to access the file exclusively?  Is there any special reason for that?
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
m0tekAuthor Commented:
Pepr - Yes

Log File Line :

28/04/11 10:33:26 SYS1 DONE GIBRISH18F 80 NONE UPDATE   ALTER    *08*-97     ? MQST.SECRET.NAVI.DATA_ S000192
0
 
peprCommented:
Try the following code that solves the switch of the month and copying the rest of the line to the output file.  The input and output filenames must be decided in advance.  You should specify better what to read and where to write...

fnameIn = 'data.log'
fnameOut = 'data.out'

fin = open(fnameIn)
fout = open(fnameOut, 'w')

for line in fin:
    # Write the characters for the month first. They are on index from 3 (included)
    # to index 5 (excluded).
    fout.write(line[3:5])
    
    # Write the slash separator.
    fout.write('/')
    
    # Write the day characters--from index zero (included) to index 2 (excl.)
    # The zero can be left out.
    fout.write(line[:2])
    
    # Write the rest of the line (from index 5 to the end)
    fout.write(line[5:])
    
fout.close()
fin.close()

Open in new window

a.py
data.log
0
 
m0tekAuthor Commented:
Hi  , Looks good ,
only concern is
1. file name is constatly changing , is there a way to do a wildcard? *.log?
2. is there a way to do a for loop with a hash for a file so i know it is not being written to

i.e

for
check hash for file 1
after 5 seconds rehash
if hash1=hash2 , continue , if not - rehash and do the loop again till hash matches

Thanks
0
 
peprCommented:
I do not know what your hash mean.  For wildcards, there is the standard module named "glob" for file globbing.  Why do you want to be sure that the file si not written to?  Some code a bit later...
0
 
peprCommented:
The core code from the above was moved to the extractLog() function.  The function is called repeatedly to process more files in the loop.  The input file names are retrieved via glob.glob('*.log').

b.py
import glob

def extractLog(fnameIn, fout):
    """Extracts the log lines from the file of the fnameIn into the open fout."""
    
    # Open the processed input file.
    fin = open(fnameIn)   
    
    # Process all the lines of the input file and write them to the fout.
    for line in fin:
        # Write the characters for the month first. They are on index from 3 (included)
        # to index 5 (excluded).
        fout.write(line[3:5])
        
        # Write the slash separator.
        fout.write('/')
        
        # Write the day characters--from index zero (included) to index 2 (excl.)
        # The zero can be left out.
        fout.write(line[:2])
        
        # Write the rest of the line (from index 5 to the end)
        fout.write(line[5:])
        
        # If the line did not contain the \n, write also the newline.
        if line[-1] != '\n':
            fout.write('\n')

    # Close the processed input file.
    fin.close()       



# Collect all the log records to the single output file (should not contain
# the .log extension because of the later used globbing).
fout = open('data.out', 'w')

for fnameIn in glob.glob('*.log'):  # loop through all log files in the working directory
    extractLog(fnameIn, fout)       # extract the lines from fnameIn to fout

fout.close()

Open in new window

b.zip
0
 
peprCommented:
0
 
peprCommented:
Not so talkative comments (more real-life look), more compact formatting...

c.py
import glob

def extractLog(fnameIn, fout):
    """Extracts the log lines from the file of the fnameIn into the open fout.
    
    Processes all the lines of the input file and write them to the fout. 
    Switches the month and the date in the timestamp in the log records, 
    appends the newline if needed."""
    
    fin = open(fnameIn)        # the processed input file
    for line in fin:
        fout.write(line[3:5])  # month first
        fout.write('/')        # separator
        fout.write(line[:2])   # day
        fout.write(line[5:])   # the rest of the line
        if line[-1] != '\n':   # newline if needed
            fout.write('\n')
    fin.close()                # the processed input file


# Collect all the log records to the single output file (should not contain
# the .log extension because of the later used globbing).
fout = open('data.out', 'w')

for fnameIn in glob.glob('*.log'):  # loop through all log files in the working directory
    extractLog(fnameIn, fout)       # extract the lines from fnameIn to fout

fout.close()

Open in new window

0
 
DhaestCommented:
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now