Solved

script for checking md5 checksums

Posted on 2011-03-01
9
802 Views
Last Modified: 2012-05-11
Hello,

I have a requirement to write a perl script for checking the md5sum checksums between 2 filesystem. The script should take Source and Destination folder as the command-line arguments and upon execution it should report if the checksum values of files in Source folder is equivalent to files in Destination folder. And if the files in source folder isn't present in Destination, it should report that error also. I came up with a logic and written a perl code but for some reason, it isn't working. Can you look at it and tell me where am going wrong or give me a new code.

It report this error:

Syntax: # do_checksum.pl  <SOURCE folder>  <DESTINATION FOLDER>

#perl do_checksum.pl /smbnas/oralsb40/oralsb11 /smbnas/oralsb40/oralsb11_restore
awk: cmd. line:1: fatal: file `/smbnas/oralsb40/oralsb11_restore' is a directory



EXAMPLES OF MD5CHECKSUM OUTPUT:

[root@]# md5sum nassync.sh
cbc234736d28b3841a6013e968bd0706  nassync.sh
[root@]# md5sum nassync.sh | awk -F" " '{print $1}'
cbc234736d28b3841a6013e968bd0706
[root@oralsb11-new opt]#

cat do_checksum.pl
#!/usr/bin/perl
# Description: This script is for checking the Checksum values between files in 2 different filesystems


my $srcdir = $ARGV[1];
my $destdir = $ARGV[2];

system("ls -l | awk \-F\" \" \'\{print \$9\}\' $srcdir > SRCFILE");
system("ls -l | awk \-F\" \" \'\{print \$9\}\' $destdir > DESTFILE");

system("sort SRCFILE > SRCFILE_SORTED");
system("sort DESTFILE > DESTFILE_SORTED");

my @srcfile = `cat SRCFILE_SORTED`;
my @destfile = `cat DESTFILE_SORTED`;
print "@srcfile\n";

foreach $i(@srcfile)
{
 chomp($srcfile[$i]);
 chomp($destfile[$i]);
 if ( "$srcfile[$i]" eq "$destfile[$i]" )
 {
        my $md5src = `md5sum $srcfile[$i]| awk \-F\" \" \'\{print \$1\}\'`;
        my $md5dest = `md5sum $destfile[$i]| awk \-F\" \" \'\{print $1\}\'`;

        if ( "$md5src" eq "$md5dest" )
        { print "MD5 value for File: $i is same on Source and destination\n"; }
        else
        { print "MD5 value for File: $i is differs with Source and destination\n"; }
 }
else
 {
        print "Source file $srcfile[$i] is not present in destination folder\n";
 }

}

Open in new window

0
Comment
Question by:ashsysad
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
9 Comments
 
LVL 12

Expert Comment

by:mccracky
ID: 35011485
Does it have to be a perl script you write?  rsync basically already does that.  You can use the -c option to only do the comparison on the checksum and you can use the "dry run" option to not actually transfer files, but it should do what you need.
0
 

Author Comment

by:ashsysad
ID: 35011515
I'm fine with any solution. Please let me know how to do it. But since I started working on Perl, it guess it would be better to complete it.
0
 
LVL 78

Expert Comment

by:arnold
ID: 35011683
Do the two location have the identical structure such that filea in src if exists will be in destination?

The problem is your use of foreach $i (@srcfile)
$i is set to the value versus the index
@srcfile=qw("a" "b" "c");
you are treating $i as though it will have values reperesening index 0,1,2 from the example above, but actually the foreach that you are using actualy returns "a","b","c" for $i
you could use a while loop
$i=0;
while ($i<=$#srcfile)
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:ashsysad
ID: 35012275
@Arnold, I still facing some problem. Please check my attached code.

Script result upon execution:
Please note in the while loop source and destination files aren't captured.

# ls -l test1 test2
test1:
total 0
-rw-r--r-- 1 root root 0 Mar  1 15:05 1
-rw-r--r-- 1 root root 0 Mar  1 15:05 2
-rw-r--r-- 1 root root 0 Mar  1 15:05 a
-rw-r--r-- 1 root root 0 Mar  1 15:05 b
-rw-r--r-- 1 root root 0 Mar  1 15:05 c
-rw-r--r-- 1 root root 0 Mar  1 15:05 d
-rw-r--r-- 1 root root 0 Mar  1 15:05 e
-rw-r--r-- 1 root root 0 Mar  1 15:05 f
-rw-r--r-- 1 root root 0 Mar  1 15:05 g

test2:
total 12
-rw-r--r-- 1 root root   0 Mar  1 15:05 1
-rw-r--r-- 1 root root   0 Mar  1 15:05 4
-rw-r--r-- 1 root root   0 Mar  1 15:05 a
-rw-r--r-- 1 root root   0 Mar  1 15:05 b
-rw-r--r-- 1 root root   0 Mar  1 15:05 d
-rw-r--r-- 1 root root 307 Mar  1 14:56 DESTFILE
-rw-r--r-- 1 root root 307 Mar  1 14:56 DESTFILE_SORTED
-rw-r--r-- 1 root root   0 Mar  1 15:05 e
-rw-r--r-- 1 root root   0 Mar  1 15:05 n
-rw------- 1 root root 211 Mar  1 14:56 nassync_ihot.log

# perl do_checksum.pl test1 test2

 1
 2
 a
 b
 c
 d
 e
 f
 g

 1
 4
 a
 b
 d
 DESTFILE
 DESTFILE_SORTED
 e
 n
 nassync_ihot.log
Source file is
Destination file is

#!/usr/bin/perl
# Description: This script is for checking the Checksum values between files in 2 different filesystems


my $srcdir = $ARGV[0];
my $destdir = $ARGV[1];

system("ls -l $srcdir | awk \-F\" \" \'\{print \$9\}\' > SRCFILE");
system("ls -l $destdir | awk \-F\" \" \'\{print \$9\}\' > DESTFILE");

system("sort SRCFILE > SRCFILE_SORTED");
system("sort DESTFILE > DESTFILE_SORTED");

my @srcfile = `cat SRCFILE_SORTED`;
my @destfile = `cat DESTFILE_SORTED`;
print "@srcfile";
print "@destfile";

my $j=0;
my $i=0;
my $k=0;

while($k<=$#srcfile)
{
 chomp($srcfile[$i]);
 chomp($destfile[$j]);
 print "Source file is $srcfile[$i]\n";
 print "Destination file is $destfile[$j] \n";
 if ( "$srcfile[$i]" eq "$destfile[$j]" )
 {
        my $md5src = `md5sum $srcfile[$i]| awk \-F\" \" \'\{print \$1\}\'`;
        my $md5dest = `md5sum $destfile[$j]| awk \-F\" \" \'\{print $1\}\'`;

        if ( "$md5src" eq "$md5dest" )
        { print "MD5 value for File: $srcfile[$i] is same on Source and destination\n"; }
        else
        { print "MD5 value for File: $srcfile[$i] is differs with Source and destination\n"; }
 }
else
 {
        print "Source file $srcfile[$i] is not present in destination folder\n";
 }
$j = $j + 1;
}

Open in new window

0
 

Author Comment

by:ashsysad
ID: 35012359
@mccracky, You told about a solution using rsync. Could you please brief me about it ?
0
 
LVL 12

Accepted Solution

by:
mccracky earned 500 total points
ID: 35012445
Depends what you want to do.

I see two solutions.  

One you can simply do a bash script.  The basic steps would be:

1. cd $srcdir
2. find ./ -type f -exec md5sum {} \; > /tmp/srcsums.txt
3. cd $dstdir
4. md5sum -c < /tmp/srcsums.txt | grep -v " OK$" > /tmp/sums_not_equal.txt

With rsync it would be basically the output of (assuming again that you want the whole tree):

rsync -rcnv $srcdir/ $destdir



0
 
LVL 78

Expert Comment

by:arnold
ID: 35012620
First you should not store the source/destination file list in the same folder where you are collecting the data.

All the files you have in the example suitable for comparison are all zero length.

There is a module Digest::MD5 as well as a perl that performs MD5sum equivalent transaction
cksum is an alternative.

Are you familiar with hashes?

This is incomplete and untested as I have to run.
See if it helps you.
open (SRCFILE, "ls $sourcedir| ") || die "Unable to list contents of $sourcedir: $!\n";
my %sourcefilehash;
while (<SRCFILE> ) {
chomp();
$sourcefilehash{$_}=`cat $_ |md5sump`;
}
close (SRCFILE);
open (DSTFILE, "ls $destinationdir| ") || die "Unable to list contents of $destinationdir: $!\n";
my %destinationfilehash;
while (<DSTFILE> ) {
chomp();
$destinationfilehash{$_}=`cat $_ |md5sum`;
}
close (DSTFILE);

foreach $filename (sort keys %sourcefile) {
          if ( exists $destinationfilehash{"$filename"} ) { #check if destination has this file
                     if ( $sourcefilehash{"$filename'} == $destinationfilehash{"$filename"} ) { #compare the md5sum results for the file
                             #do what you need the two md5sums are the same
                     }
                      else {
                             #do what you need as the md5sum results do not match
                             }
            else {
                      # the file does not exist at the destination
            }
}

Open in new window

0
 

Author Closing Comment

by:ashsysad
ID: 35014796
The solution given by you looks simple and straight-forward and it worked for me. Thanks a lot !!
0
 

Author Comment

by:ashsysad
ID: 35014801
@Arnold, Thanks for your time to help me.
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question