Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Perl script to find new files and compress those new files

Posted on 2016-09-14
5
Medium Priority
?
181 Views
Last Modified: 2016-09-26
Hi - My other software dumps files in the one folder (example: folderA)
Now I want a script that consistently watch this folder (May be once in 15min) to find new files and compress those new files and copy it to different folder with timestamp.

Example: FolderA contains the below files
1) testfile1.txt
2) testfile2.txt

FolderB already contains the old files
1) testfile1_09152016_121066.txt
2) testfile2_09152016_121066.txt
3) testfile1_09142016_121066.txt
4) testfile2_09142016_121066.txt

Now I want to check whether testfile1.txt is new file or not by comparing with the latest files from folderB
like comparing testfile1.txt with testfile1_09152016_121066.txt and if it is different then rename testfile1.txt with timestamp and copy the file to folderB after compressing it.
The size of the file is so big, its 1GB min and 4gb Max.
so can't compare the actual content in the file.

So can someone help me how to identify new files and compress them.

Thanks,
0
Comment
Question by:shragi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 7

Expert Comment

by:D Patel
ID: 41799368
You can use like this:

PATH_SRC="/home/celvas/Documents/Imp_Task/"
PATH_DST="/home/celvas/Downloads/zeeshan/"

cd $PATH_SRC
TODAY=$(date  -d "$(date +%F)" +%s)
TODAY_TIME=$(date -d "$(date +%T)" +%s)


for f in `ls`;
do
#       echo "File -> $f"
        MOD_DATE=$(stat -c %y "$f")
        MOD_DATE=${MOD_DATE% *}
#       echo MOD_DATE: $MOD_DATE
        MOD_DATE1=$(date -d "$MOD_DATE" +%s)
#       echo MOD_DATE: $MOD_DATE

DIFF_IN_DATE=$[ $MOD_DATE1 - $TODAY ]
DIFF_IN_DATE1=$[ $MOD_DATE1 - $TODAY_TIME ]
#echo DIFF: $DIFF_IN_DATE
#echo DIFF1: $DIFF_IN_DATE1
if [[ ($DIFF_IN_DATE -ge -120) && ($DIFF_IN_DATE1 -le 120) && (DIFF_IN_DATE1 -ge -120) ]]
then
echo File lies in Next Hour = $f
echo MOD_DATE: $MOD_DATE

#mv $PATH_SRC/$f  $PATH_DST/$f
fi
done
0
 
LVL 7

Expert Comment

by:D Patel
ID: 41799371
And then

tar --newer date -d'7 days ago' +"%d-%b" -zcf thisweek.tgz
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 41799822
Based on your description the files in folderB are compressed, but the ones in folderA aren't and won't be until after the comparison so how do you want to handle that difference when comparing?

With such large files, the best way to make a comparison would be to generate an md5 checksum of each and compare those checksums.  Both files would need to be in the same state (i.e., either compressed or not) when generating the checksum.
0
 

Author Comment

by:shragi
ID: 41799933
Hi FishMonger - to generate checksum if both needs to be in same state, then we can add third folder for moving compressed files.
FolderA - Drop zone where you can find new files
FolderB - Contains old files with timestamp
FolderC - Contains compressed files of FolderB.

So how do we do the checksum I mean how to write the script for that.
0
 
LVL 28

Accepted Solution

by:
FishMonger earned 2000 total points
ID: 41799972
Another factor which you need to take into account is whether or not the file in folderA is still being written when you want to compare it with the file(s) in folderB.  You don't want to try to compress and copy it while it's still being written.

Why keep multiple copies of the same file, especially given their size?  It's not real clear, but it sounds like your looking at having at least 2 maybe even 3 copies of each file (one in each folder).

A quick google search will give you a number of example resources on how to genertate the checksum.  Here's one

This one goes into a little more detail.
0

Featured Post

Application Discovery Service in AWS

In the era of the cloud, customers migrating away from their existing on-premise infrastructure. This requires lots of planning, strategies, and effort to identify their existing resources and determine how best to migrate.  Datacenter migrations happen in four phases -

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question