Solved

Perl script to find new files and compress those new files

Posted on 2016-09-14
5
81 Views
Last Modified: 2016-09-26
Hi - My other software dumps files in the one folder (example: folderA)
Now I want a script that consistently watch this folder (May be once in 15min) to find new files and compress those new files and copy it to different folder with timestamp.

Example: FolderA contains the below files
1) testfile1.txt
2) testfile2.txt

FolderB already contains the old files
1) testfile1_09152016_121066.txt
2) testfile2_09152016_121066.txt
3) testfile1_09142016_121066.txt
4) testfile2_09142016_121066.txt

Now I want to check whether testfile1.txt is new file or not by comparing with the latest files from folderB
like comparing testfile1.txt with testfile1_09152016_121066.txt and if it is different then rename testfile1.txt with timestamp and copy the file to folderB after compressing it.
The size of the file is so big, its 1GB min and 4gb Max.
so can't compare the actual content in the file.

So can someone help me how to identify new files and compress them.

Thanks,
0
Comment
Question by:shragi
  • 2
  • 2
5 Comments
 
LVL 6

Expert Comment

by:DPatel
ID: 41799368
You can use like this:

PATH_SRC="/home/celvas/Documents/Imp_Task/"
PATH_DST="/home/celvas/Downloads/zeeshan/"

cd $PATH_SRC
TODAY=$(date  -d "$(date +%F)" +%s)
TODAY_TIME=$(date -d "$(date +%T)" +%s)


for f in `ls`;
do
#       echo "File -> $f"
        MOD_DATE=$(stat -c %y "$f")
        MOD_DATE=${MOD_DATE% *}
#       echo MOD_DATE: $MOD_DATE
        MOD_DATE1=$(date -d "$MOD_DATE" +%s)
#       echo MOD_DATE: $MOD_DATE

DIFF_IN_DATE=$[ $MOD_DATE1 - $TODAY ]
DIFF_IN_DATE1=$[ $MOD_DATE1 - $TODAY_TIME ]
#echo DIFF: $DIFF_IN_DATE
#echo DIFF1: $DIFF_IN_DATE1
if [[ ($DIFF_IN_DATE -ge -120) && ($DIFF_IN_DATE1 -le 120) && (DIFF_IN_DATE1 -ge -120) ]]
then
echo File lies in Next Hour = $f
echo MOD_DATE: $MOD_DATE

#mv $PATH_SRC/$f  $PATH_DST/$f
fi
done
0
 
LVL 6

Expert Comment

by:DPatel
ID: 41799371
And then

tar --newer date -d'7 days ago' +"%d-%b" -zcf thisweek.tgz
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 41799822
Based on your description the files in folderB are compressed, but the ones in folderA aren't and won't be until after the comparison so how do you want to handle that difference when comparing?

With such large files, the best way to make a comparison would be to generate an md5 checksum of each and compare those checksums.  Both files would need to be in the same state (i.e., either compressed or not) when generating the checksum.
0
 

Author Comment

by:shragi
ID: 41799933
Hi FishMonger - to generate checksum if both needs to be in same state, then we can add third folder for moving compressed files.
FolderA - Drop zone where you can find new files
FolderB - Contains old files with timestamp
FolderC - Contains compressed files of FolderB.

So how do we do the checksum I mean how to write the script for that.
0
 
LVL 28

Accepted Solution

by:
FishMonger earned 500 total points
ID: 41799972
Another factor which you need to take into account is whether or not the file in folderA is still being written when you want to compare it with the file(s) in folderB.  You don't want to try to compress and copy it while it's still being written.

Why keep multiple copies of the same file, especially given their size?  It's not real clear, but it sounds like your looking at having at least 2 maybe even 3 copies of each file (one in each folder).

A quick google search will give you a number of example resources on how to genertate the checksum.  Here's one

This one goes into a little more detail.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I hope you'll find this tutorial useful and interesting. So let's try to extend Tcl with a new package.  For anyone more deeply interested please check out the book "Practical Programming in Tcl and Tk". It's really one of the best written books abo…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now