Solved

script for checking md5 checksums

Posted on 2011-03-01
9
817 Views
Last Modified: 2012-05-11
Hello,

I have a requirement to write a perl script for checking the md5sum checksums between 2 filesystem. The script should take Source and Destination folder as the command-line arguments and upon execution it should report if the checksum values of files in Source folder is equivalent to files in Destination folder. And if the files in source folder isn't present in Destination, it should report that error also. I came up with a logic and written a perl code but for some reason, it isn't working. Can you look at it and tell me where am going wrong or give me a new code.

It report this error:

Syntax: # do_checksum.pl  <SOURCE folder>  <DESTINATION FOLDER>

#perl do_checksum.pl /smbnas/oralsb40/oralsb11 /smbnas/oralsb40/oralsb11_restore
awk: cmd. line:1: fatal: file `/smbnas/oralsb40/oralsb11_restore' is a directory



EXAMPLES OF MD5CHECKSUM OUTPUT:

[root@]# md5sum nassync.sh
cbc234736d28b3841a6013e968bd0706  nassync.sh
[root@]# md5sum nassync.sh | awk -F" " '{print $1}'
cbc234736d28b3841a6013e968bd0706
[root@oralsb11-new opt]#

cat do_checksum.pl
#!/usr/bin/perl
# Description: This script is for checking the Checksum values between files in 2 different filesystems


my $srcdir = $ARGV[1];
my $destdir = $ARGV[2];

system("ls -l | awk \-F\" \" \'\{print \$9\}\' $srcdir > SRCFILE");
system("ls -l | awk \-F\" \" \'\{print \$9\}\' $destdir > DESTFILE");

system("sort SRCFILE > SRCFILE_SORTED");
system("sort DESTFILE > DESTFILE_SORTED");

my @srcfile = `cat SRCFILE_SORTED`;
my @destfile = `cat DESTFILE_SORTED`;
print "@srcfile\n";

foreach $i(@srcfile)
{
 chomp($srcfile[$i]);
 chomp($destfile[$i]);
 if ( "$srcfile[$i]" eq "$destfile[$i]" )
 {
        my $md5src = `md5sum $srcfile[$i]| awk \-F\" \" \'\{print \$1\}\'`;
        my $md5dest = `md5sum $destfile[$i]| awk \-F\" \" \'\{print $1\}\'`;

        if ( "$md5src" eq "$md5dest" )
        { print "MD5 value for File: $i is same on Source and destination\n"; }
        else
        { print "MD5 value for File: $i is differs with Source and destination\n"; }
 }
else
 {
        print "Source file $srcfile[$i] is not present in destination folder\n";
 }

}

Open in new window

0
Comment
Question by:ashsysad
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
9 Comments
 
LVL 12

Expert Comment

by:mccracky
ID: 35011485
Does it have to be a perl script you write?  rsync basically already does that.  You can use the -c option to only do the comparison on the checksum and you can use the "dry run" option to not actually transfer files, but it should do what you need.
0
 

Author Comment

by:ashsysad
ID: 35011515
I'm fine with any solution. Please let me know how to do it. But since I started working on Perl, it guess it would be better to complete it.
0
 
LVL 79

Expert Comment

by:arnold
ID: 35011683
Do the two location have the identical structure such that filea in src if exists will be in destination?

The problem is your use of foreach $i (@srcfile)
$i is set to the value versus the index
@srcfile=qw("a" "b" "c");
you are treating $i as though it will have values reperesening index 0,1,2 from the example above, but actually the foreach that you are using actualy returns "a","b","c" for $i
you could use a while loop
$i=0;
while ($i<=$#srcfile)
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:ashsysad
ID: 35012275
@Arnold, I still facing some problem. Please check my attached code.

Script result upon execution:
Please note in the while loop source and destination files aren't captured.

# ls -l test1 test2
test1:
total 0
-rw-r--r-- 1 root root 0 Mar  1 15:05 1
-rw-r--r-- 1 root root 0 Mar  1 15:05 2
-rw-r--r-- 1 root root 0 Mar  1 15:05 a
-rw-r--r-- 1 root root 0 Mar  1 15:05 b
-rw-r--r-- 1 root root 0 Mar  1 15:05 c
-rw-r--r-- 1 root root 0 Mar  1 15:05 d
-rw-r--r-- 1 root root 0 Mar  1 15:05 e
-rw-r--r-- 1 root root 0 Mar  1 15:05 f
-rw-r--r-- 1 root root 0 Mar  1 15:05 g

test2:
total 12
-rw-r--r-- 1 root root   0 Mar  1 15:05 1
-rw-r--r-- 1 root root   0 Mar  1 15:05 4
-rw-r--r-- 1 root root   0 Mar  1 15:05 a
-rw-r--r-- 1 root root   0 Mar  1 15:05 b
-rw-r--r-- 1 root root   0 Mar  1 15:05 d
-rw-r--r-- 1 root root 307 Mar  1 14:56 DESTFILE
-rw-r--r-- 1 root root 307 Mar  1 14:56 DESTFILE_SORTED
-rw-r--r-- 1 root root   0 Mar  1 15:05 e
-rw-r--r-- 1 root root   0 Mar  1 15:05 n
-rw------- 1 root root 211 Mar  1 14:56 nassync_ihot.log

# perl do_checksum.pl test1 test2

 1
 2
 a
 b
 c
 d
 e
 f
 g

 1
 4
 a
 b
 d
 DESTFILE
 DESTFILE_SORTED
 e
 n
 nassync_ihot.log
Source file is
Destination file is

#!/usr/bin/perl
# Description: This script is for checking the Checksum values between files in 2 different filesystems


my $srcdir = $ARGV[0];
my $destdir = $ARGV[1];

system("ls -l $srcdir | awk \-F\" \" \'\{print \$9\}\' > SRCFILE");
system("ls -l $destdir | awk \-F\" \" \'\{print \$9\}\' > DESTFILE");

system("sort SRCFILE > SRCFILE_SORTED");
system("sort DESTFILE > DESTFILE_SORTED");

my @srcfile = `cat SRCFILE_SORTED`;
my @destfile = `cat DESTFILE_SORTED`;
print "@srcfile";
print "@destfile";

my $j=0;
my $i=0;
my $k=0;

while($k<=$#srcfile)
{
 chomp($srcfile[$i]);
 chomp($destfile[$j]);
 print "Source file is $srcfile[$i]\n";
 print "Destination file is $destfile[$j] \n";
 if ( "$srcfile[$i]" eq "$destfile[$j]" )
 {
        my $md5src = `md5sum $srcfile[$i]| awk \-F\" \" \'\{print \$1\}\'`;
        my $md5dest = `md5sum $destfile[$j]| awk \-F\" \" \'\{print $1\}\'`;

        if ( "$md5src" eq "$md5dest" )
        { print "MD5 value for File: $srcfile[$i] is same on Source and destination\n"; }
        else
        { print "MD5 value for File: $srcfile[$i] is differs with Source and destination\n"; }
 }
else
 {
        print "Source file $srcfile[$i] is not present in destination folder\n";
 }
$j = $j + 1;
}

Open in new window

0
 

Author Comment

by:ashsysad
ID: 35012359
@mccracky, You told about a solution using rsync. Could you please brief me about it ?
0
 
LVL 12

Accepted Solution

by:
mccracky earned 500 total points
ID: 35012445
Depends what you want to do.

I see two solutions.  

One you can simply do a bash script.  The basic steps would be:

1. cd $srcdir
2. find ./ -type f -exec md5sum {} \; > /tmp/srcsums.txt
3. cd $dstdir
4. md5sum -c < /tmp/srcsums.txt | grep -v " OK$" > /tmp/sums_not_equal.txt

With rsync it would be basically the output of (assuming again that you want the whole tree):

rsync -rcnv $srcdir/ $destdir



0
 
LVL 79

Expert Comment

by:arnold
ID: 35012620
First you should not store the source/destination file list in the same folder where you are collecting the data.

All the files you have in the example suitable for comparison are all zero length.

There is a module Digest::MD5 as well as a perl that performs MD5sum equivalent transaction
cksum is an alternative.

Are you familiar with hashes?

This is incomplete and untested as I have to run.
See if it helps you.
open (SRCFILE, "ls $sourcedir| ") || die "Unable to list contents of $sourcedir: $!\n";
my %sourcefilehash;
while (<SRCFILE> ) {
chomp();
$sourcefilehash{$_}=`cat $_ |md5sump`;
}
close (SRCFILE);
open (DSTFILE, "ls $destinationdir| ") || die "Unable to list contents of $destinationdir: $!\n";
my %destinationfilehash;
while (<DSTFILE> ) {
chomp();
$destinationfilehash{$_}=`cat $_ |md5sum`;
}
close (DSTFILE);

foreach $filename (sort keys %sourcefile) {
          if ( exists $destinationfilehash{"$filename"} ) { #check if destination has this file
                     if ( $sourcefilehash{"$filename'} == $destinationfilehash{"$filename"} ) { #compare the md5sum results for the file
                             #do what you need the two md5sums are the same
                     }
                      else {
                             #do what you need as the md5sum results do not match
                             }
            else {
                      # the file does not exist at the destination
            }
}

Open in new window

0
 

Author Closing Comment

by:ashsysad
ID: 35014796
The solution given by you looks simple and straight-forward and it worked for me. Thanks a lot !!
0
 

Author Comment

by:ashsysad
ID: 35014801
@Arnold, Thanks for your time to help me.
0

Featured Post

Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question