Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 831
  • Last Modified:

script for checking md5 checksums

Hello,

I have a requirement to write a perl script for checking the md5sum checksums between 2 filesystem. The script should take Source and Destination folder as the command-line arguments and upon execution it should report if the checksum values of files in Source folder is equivalent to files in Destination folder. And if the files in source folder isn't present in Destination, it should report that error also. I came up with a logic and written a perl code but for some reason, it isn't working. Can you look at it and tell me where am going wrong or give me a new code.

It report this error:

Syntax: # do_checksum.pl  <SOURCE folder>  <DESTINATION FOLDER>

#perl do_checksum.pl /smbnas/oralsb40/oralsb11 /smbnas/oralsb40/oralsb11_restore
awk: cmd. line:1: fatal: file `/smbnas/oralsb40/oralsb11_restore' is a directory



EXAMPLES OF MD5CHECKSUM OUTPUT:

[root@]# md5sum nassync.sh
cbc234736d28b3841a6013e968bd0706  nassync.sh
[root@]# md5sum nassync.sh | awk -F" " '{print $1}'
cbc234736d28b3841a6013e968bd0706
[root@oralsb11-new opt]#

cat do_checksum.pl
#!/usr/bin/perl
# Description: This script is for checking the Checksum values between files in 2 different filesystems


my $srcdir = $ARGV[1];
my $destdir = $ARGV[2];

system("ls -l | awk \-F\" \" \'\{print \$9\}\' $srcdir > SRCFILE");
system("ls -l | awk \-F\" \" \'\{print \$9\}\' $destdir > DESTFILE");

system("sort SRCFILE > SRCFILE_SORTED");
system("sort DESTFILE > DESTFILE_SORTED");

my @srcfile = `cat SRCFILE_SORTED`;
my @destfile = `cat DESTFILE_SORTED`;
print "@srcfile\n";

foreach $i(@srcfile)
{
 chomp($srcfile[$i]);
 chomp($destfile[$i]);
 if ( "$srcfile[$i]" eq "$destfile[$i]" )
 {
        my $md5src = `md5sum $srcfile[$i]| awk \-F\" \" \'\{print \$1\}\'`;
        my $md5dest = `md5sum $destfile[$i]| awk \-F\" \" \'\{print $1\}\'`;

        if ( "$md5src" eq "$md5dest" )
        { print "MD5 value for File: $i is same on Source and destination\n"; }
        else
        { print "MD5 value for File: $i is differs with Source and destination\n"; }
 }
else
 {
        print "Source file $srcfile[$i] is not present in destination folder\n";
 }

}

Open in new window

0
ashsysad
Asked:
ashsysad
  • 5
  • 2
  • 2
1 Solution
 
mccrackyCommented:
Does it have to be a perl script you write?  rsync basically already does that.  You can use the -c option to only do the comparison on the checksum and you can use the "dry run" option to not actually transfer files, but it should do what you need.
0
 
ashsysadAuthor Commented:
I'm fine with any solution. Please let me know how to do it. But since I started working on Perl, it guess it would be better to complete it.
0
 
arnoldCommented:
Do the two location have the identical structure such that filea in src if exists will be in destination?

The problem is your use of foreach $i (@srcfile)
$i is set to the value versus the index
@srcfile=qw("a" "b" "c");
you are treating $i as though it will have values reperesening index 0,1,2 from the example above, but actually the foreach that you are using actualy returns "a","b","c" for $i
you could use a while loop
$i=0;
while ($i<=$#srcfile)
0
A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

 
ashsysadAuthor Commented:
@Arnold, I still facing some problem. Please check my attached code.

Script result upon execution:
Please note in the while loop source and destination files aren't captured.

# ls -l test1 test2
test1:
total 0
-rw-r--r-- 1 root root 0 Mar  1 15:05 1
-rw-r--r-- 1 root root 0 Mar  1 15:05 2
-rw-r--r-- 1 root root 0 Mar  1 15:05 a
-rw-r--r-- 1 root root 0 Mar  1 15:05 b
-rw-r--r-- 1 root root 0 Mar  1 15:05 c
-rw-r--r-- 1 root root 0 Mar  1 15:05 d
-rw-r--r-- 1 root root 0 Mar  1 15:05 e
-rw-r--r-- 1 root root 0 Mar  1 15:05 f
-rw-r--r-- 1 root root 0 Mar  1 15:05 g

test2:
total 12
-rw-r--r-- 1 root root   0 Mar  1 15:05 1
-rw-r--r-- 1 root root   0 Mar  1 15:05 4
-rw-r--r-- 1 root root   0 Mar  1 15:05 a
-rw-r--r-- 1 root root   0 Mar  1 15:05 b
-rw-r--r-- 1 root root   0 Mar  1 15:05 d
-rw-r--r-- 1 root root 307 Mar  1 14:56 DESTFILE
-rw-r--r-- 1 root root 307 Mar  1 14:56 DESTFILE_SORTED
-rw-r--r-- 1 root root   0 Mar  1 15:05 e
-rw-r--r-- 1 root root   0 Mar  1 15:05 n
-rw------- 1 root root 211 Mar  1 14:56 nassync_ihot.log

# perl do_checksum.pl test1 test2

 1
 2
 a
 b
 c
 d
 e
 f
 g

 1
 4
 a
 b
 d
 DESTFILE
 DESTFILE_SORTED
 e
 n
 nassync_ihot.log
Source file is
Destination file is

#!/usr/bin/perl
# Description: This script is for checking the Checksum values between files in 2 different filesystems


my $srcdir = $ARGV[0];
my $destdir = $ARGV[1];

system("ls -l $srcdir | awk \-F\" \" \'\{print \$9\}\' > SRCFILE");
system("ls -l $destdir | awk \-F\" \" \'\{print \$9\}\' > DESTFILE");

system("sort SRCFILE > SRCFILE_SORTED");
system("sort DESTFILE > DESTFILE_SORTED");

my @srcfile = `cat SRCFILE_SORTED`;
my @destfile = `cat DESTFILE_SORTED`;
print "@srcfile";
print "@destfile";

my $j=0;
my $i=0;
my $k=0;

while($k<=$#srcfile)
{
 chomp($srcfile[$i]);
 chomp($destfile[$j]);
 print "Source file is $srcfile[$i]\n";
 print "Destination file is $destfile[$j] \n";
 if ( "$srcfile[$i]" eq "$destfile[$j]" )
 {
        my $md5src = `md5sum $srcfile[$i]| awk \-F\" \" \'\{print \$1\}\'`;
        my $md5dest = `md5sum $destfile[$j]| awk \-F\" \" \'\{print $1\}\'`;

        if ( "$md5src" eq "$md5dest" )
        { print "MD5 value for File: $srcfile[$i] is same on Source and destination\n"; }
        else
        { print "MD5 value for File: $srcfile[$i] is differs with Source and destination\n"; }
 }
else
 {
        print "Source file $srcfile[$i] is not present in destination folder\n";
 }
$j = $j + 1;
}

Open in new window

0
 
ashsysadAuthor Commented:
@mccracky, You told about a solution using rsync. Could you please brief me about it ?
0
 
mccrackyCommented:
Depends what you want to do.

I see two solutions.  

One you can simply do a bash script.  The basic steps would be:

1. cd $srcdir
2. find ./ -type f -exec md5sum {} \; > /tmp/srcsums.txt
3. cd $dstdir
4. md5sum -c < /tmp/srcsums.txt | grep -v " OK$" > /tmp/sums_not_equal.txt

With rsync it would be basically the output of (assuming again that you want the whole tree):

rsync -rcnv $srcdir/ $destdir



0
 
arnoldCommented:
First you should not store the source/destination file list in the same folder where you are collecting the data.

All the files you have in the example suitable for comparison are all zero length.

There is a module Digest::MD5 as well as a perl that performs MD5sum equivalent transaction
cksum is an alternative.

Are you familiar with hashes?

This is incomplete and untested as I have to run.
See if it helps you.
open (SRCFILE, "ls $sourcedir| ") || die "Unable to list contents of $sourcedir: $!\n";
my %sourcefilehash;
while (<SRCFILE> ) {
chomp();
$sourcefilehash{$_}=`cat $_ |md5sump`;
}
close (SRCFILE);
open (DSTFILE, "ls $destinationdir| ") || die "Unable to list contents of $destinationdir: $!\n";
my %destinationfilehash;
while (<DSTFILE> ) {
chomp();
$destinationfilehash{$_}=`cat $_ |md5sum`;
}
close (DSTFILE);

foreach $filename (sort keys %sourcefile) {
          if ( exists $destinationfilehash{"$filename"} ) { #check if destination has this file
                     if ( $sourcefilehash{"$filename'} == $destinationfilehash{"$filename"} ) { #compare the md5sum results for the file
                             #do what you need the two md5sums are the same
                     }
                      else {
                             #do what you need as the md5sum results do not match
                             }
            else {
                      # the file does not exist at the destination
            }
}

Open in new window

0
 
ashsysadAuthor Commented:
The solution given by you looks simple and straight-forward and it worked for me. Thanks a lot !!
0
 
ashsysadAuthor Commented:
@Arnold, Thanks for your time to help me.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Build your data science skills into a career

Are you ready to take your data science career to the next step, or break into data science? With Springboard’s Data Science Career Track, you’ll master data science topics, have personalized career guidance, weekly calls with a data science expert, and a job guarantee.

  • 5
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now