Solved

Perl script to find new files and compress those new files

Posted on 2016-09-14
5
58 Views
Last Modified: 2016-09-26
Hi - My other software dumps files in the one folder (example: folderA)
Now I want a script that consistently watch this folder (May be once in 15min) to find new files and compress those new files and copy it to different folder with timestamp.

Example: FolderA contains the below files
1) testfile1.txt
2) testfile2.txt

FolderB already contains the old files
1) testfile1_09152016_121066.txt
2) testfile2_09152016_121066.txt
3) testfile1_09142016_121066.txt
4) testfile2_09142016_121066.txt

Now I want to check whether testfile1.txt is new file or not by comparing with the latest files from folderB
like comparing testfile1.txt with testfile1_09152016_121066.txt and if it is different then rename testfile1.txt with timestamp and copy the file to folderB after compressing it.
The size of the file is so big, its 1GB min and 4gb Max.
so can't compare the actual content in the file.

So can someone help me how to identify new files and compress them.

Thanks,
0
Comment
Question by:shragi
  • 2
  • 2
5 Comments
 
LVL 5

Expert Comment

by:D Patel
ID: 41799368
You can use like this:

PATH_SRC="/home/celvas/Documents/Imp_Task/"
PATH_DST="/home/celvas/Downloads/zeeshan/"

cd $PATH_SRC
TODAY=$(date  -d "$(date +%F)" +%s)
TODAY_TIME=$(date -d "$(date +%T)" +%s)


for f in `ls`;
do
#       echo "File -> $f"
        MOD_DATE=$(stat -c %y "$f")
        MOD_DATE=${MOD_DATE% *}
#       echo MOD_DATE: $MOD_DATE
        MOD_DATE1=$(date -d "$MOD_DATE" +%s)
#       echo MOD_DATE: $MOD_DATE

DIFF_IN_DATE=$[ $MOD_DATE1 - $TODAY ]
DIFF_IN_DATE1=$[ $MOD_DATE1 - $TODAY_TIME ]
#echo DIFF: $DIFF_IN_DATE
#echo DIFF1: $DIFF_IN_DATE1
if [[ ($DIFF_IN_DATE -ge -120) && ($DIFF_IN_DATE1 -le 120) && (DIFF_IN_DATE1 -ge -120) ]]
then
echo File lies in Next Hour = $f
echo MOD_DATE: $MOD_DATE

#mv $PATH_SRC/$f  $PATH_DST/$f
fi
done
0
 
LVL 5

Expert Comment

by:D Patel
ID: 41799371
And then

tar --newer date -d'7 days ago' +"%d-%b" -zcf thisweek.tgz
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 41799822
Based on your description the files in folderB are compressed, but the ones in folderA aren't and won't be until after the comparison so how do you want to handle that difference when comparing?

With such large files, the best way to make a comparison would be to generate an md5 checksum of each and compare those checksums.  Both files would need to be in the same state (i.e., either compressed or not) when generating the checksum.
0
 

Author Comment

by:shragi
ID: 41799933
Hi FishMonger - to generate checksum if both needs to be in same state, then we can add third folder for moving compressed files.
FolderA - Drop zone where you can find new files
FolderB - Contains old files with timestamp
FolderC - Contains compressed files of FolderB.

So how do we do the checksum I mean how to write the script for that.
0
 
LVL 28

Accepted Solution

by:
FishMonger earned 500 total points
ID: 41799972
Another factor which you need to take into account is whether or not the file in folderA is still being written when you want to compare it with the file(s) in folderB.  You don't want to try to compress and copy it while it's still being written.

Why keep multiple copies of the same file, especially given their size?  It's not real clear, but it sounds like your looking at having at least 2 maybe even 3 copies of each file (one in each folder).

A quick google search will give you a number of example resources on how to genertate the checksum.  Here's one

This one goes into a little more detail.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now