Link to home
Create AccountLog in
Avatar of sud
sud

asked on

PERL compare directory files

Hello Experts,
          I have three directory, DATA, TEMP & PREVIOUS.
1. In the first step I copy everything from DATA to TEMP directory, Which I am able to do

2. In the sceond step some data import program that update the DATA directory, again I am able to do this

3.  In the third step I compare content of TEMP directory to DATA directory,
     For the files that get updated in the DATA  directory, I copy the files from the TEMP to PREVIOUS
    directory,

  Can you help me how can i do the third step ?

thank you and appreciate your help.
Avatar of Jason Minton
Jason Minton
Flag of United States of America image

You're gonna want to loop through the files in DATA and for each file in DATA, compare it to the same file in Temp and use ther perl DIFF function to compare them.

CPAN DIFF LINK:
http://search.cpan.org/~tyemq/Algorithm-Diff-1.1902/lib/Algorithm/Diff.pm

If you find a diff, then do the move as needed.
Avatar of Adam314
Adam314

For step 3, can you use the LastModified property of the file to determine which files have changed, or do you need to do a compare of the actual file contents?
Avatar of sud

ASKER

is there any easier way,
 as the difference in files in DATA directory from TEMP directory files will be in attributes of the file. I dont want to compare the whole file.

I think if i can somehow compare attributes of the file in the two directory than it will be easier, to decide which files got updated and accordingly i can move the files to PREVIOUS directory.

any suggestion ?
Avatar of sud

ASKER

hi Adam,
    i do not need to compare the actual file.
In that case you can simply 'stat' the file and look at the size or mode time.

To get the last mod time of a file:
my $mtime = (stat($file))[9];

Here is a complete list of what stat will return:
  0 dev      device number of filesystem
  1 ino      inode number
  2 mode     file mode  (type and permissions)
  3 nlink    number of (hard) links to the file
  4 uid      numeric user ID of file's owner
  5 gid      numeric group ID of file's owner
  6 rdev     the device identifier (special files only)
  7 size     total size of file, in bytes
  8 atime    last access time in seconds since the epoch
  9 mtime    last modify time in seconds since the epoch
 10 ctime    inode change time (NOT creation time!) in seconds since the epoch
 11 blksize  preferred block size for file system I/O
 12 blocks   actual number of blocks allocated

You can use one of those or a combination to compare the files.
Avatar of sud

ASKER

OK
That's why I was asking.  It will be faster to just look at last modified time.

use File::Find;

# Do this as part of step 1, or between step 1 and step 2
my %OrigMtime;
find(
    sub {
        return unless -f $File::Find::name;
        $OrigMtime{$File::Find::name} = (stat($File::Find::name))[9];
    },
    $DataDirectory);

#Do step 2 here

#This is step 3
find(
    sub {
        return unless -f $File::Find::name;
        return if $OrigMtime{$File::Find::name} == (stat($File::Find::name))[9];
        print "File $File::Find::name has been changed.\n";
        #Do whatever you need here when a file has been changed
    },
    $DataDirectory);
Avatar of sud

ASKER

Thanks Adam,
           
Avatar of sud

ASKER

Hi Adam,

Following is the code that I am using to compare two directory and i am trying to print file names that has been updated in A:/temp/Data directory. I am getting error "use of uninitialized value in numeric eq <==> at newfile.pl line 37. "

  chdir "A:/temp/newcompare"
   or die "Can't chdir to A:/temp/newcompare$!\n" ;
 
  my $File;
    my $DataDirectory="A:/temp/newcompare/";

  my %OrigMtime;
  find(
      sub{
         return unless -f $File::Find::name;
          $OrigMtime{$File::Find::name}=(stat($File::Find::name))[9];
      },
      $DataDirectory
);

  chdir "A:/temp//Data"
   or die "Can't chdir to A:/temp/Data$!\n" ;
 
$DataDirectory="M:/temp/Sajin/GS-US-174-0106/Data/";
   find(
     sub{
       return unless -f $File::Find::name;
        return if $OrigMtime{$File::Find::name}==(stat($File::Find::name))[9];
        print "File $File::Find::name has been changed. \n";

  },
   $DataDirectory);
       Any suggestion ?

Thanks
Avatar of ozo
Perhaps you wanted
    my $DataDirectory="A:/temp/newcompare/";

  my %OrigMtime;
  find(
      sub{
         return unless -f $File::Find::name;
          $OrigMtime{$_}=(stat($File::Find::name))[9];
      },
      $DataDirectory
);

  chdir "A:/temp//Data"
   or die "Can't chdir to A:/temp/Data$!\n" ;
 
$DataDirectory="M:/temp/Sajin/GS-US-174-0106/Data/";
   find(
     sub{
       return unless -f $File::Find::name;
        return if $OrigMtime{$_}==(stat($File::Find::name))[9];
        print "File $File::Find::name has been changed. \n";

  },
Avatar of sud

ASKER

OZO,
  following is my modified code but still same error

chdir "A:/temp/newcompare"
   or die "Can't chdir to A:/temp/newcompare$!\n" ;

  my %OrigMtime;
  my $DataDirectory="A:/temp/newcompare/";

   find(
      sub{
         return unless -f $File::Find::name;
          $OrigMtime{$_}=(stat($File::Find::name))[9];
      },
      $DataDirectory);


  chdir "A:/temp/Data"
   or die "Can't chdir to A:/tempData$!\n" ;
 
$DataDirectory="A:/temp/Data/";
   

  find(
     sub{
       return unless -f $File::Find::name;
        return if $OrigMtime{$_}==(stat($File::Find::name))[9];
        print "File $File::Find::name has been changed. \n";

  },
   $DataDirectory);
       
Are there files in A:/temp/Data/ that aren't in A:/temp/newcompare/?
What do you want to do with them?
find(
    sub{
        return unless -f $File::Find::name;
        print "New file $_",return unless exists($OrigMtime{$_});
        return if $OrigMtime{$_}==(stat($File::Find::name))[9];
        print "File $File::Find::name has been changed. \n";
  },
   $DataDirectory);
Avatar of sud

ASKER

At present all the files in A:/temp/Data/ are in A:/temp/newcompare/,  except newcompare directory has old files while Data directory has new or recent files. So files in Data directory has todays date and time stamp.

I want to copy the files from the A:/temp/newcompare directory to a A:/temp/Previous directory, but only the files that are updated in in A;/temp/Data directory.
 
Avatar of sud

ASKER

adam,
 i get message "use of uninitialized value in numeric eq <==> at newfile.pl line 40. "
Is return if $OrigMtime{$_}==(stat($File::Find::name))[9]; line 40?
What were $_ and $File::Find::name when you got that error?
what files are in A:/temp/newcompare/
Avatar of sud

ASKER


Is return if $OrigMtime{$_}==(stat($File::Find::name))[9]; line 40?   YES
What were $_ and $File::Find::name when you got that error? for all the files in the directory

i just tried using 1 text file abc.text and i still get the same error twice.
You get the error twice with one file in A:/temp/newcompare/ and A:/temp/Data/?
Is line 40 executing twice on the same file?
is $OrigMtime{$_}=(stat($File::Find::name))[9]; executing?
Maybe there was an error when you copied the code... here it is again:



my %OrigMtime;

my $DataDirectory="A:/temp/newcompare";
find(
    sub{
        return unless -f $File::Find::name;
        $OrigMtime{$_}=(stat($File::Find::name))[9];
    },
    $DataDirectory);


$DataDirectory="A:/temp/Data";
find(
    sub{
        return unless -f $File::Find::name;
        print "New file $_",return unless exists($OrigMtime{$_});
        return if $OrigMtime{$_}==(stat($File::Find::name))[9];
        print "File $File::Find::name has been changed. \n";
  },
   $DataDirectory);
Avatar of sud

ASKER

Right OZO, I am getting error twice with one abc.txt

I think lin40 is executing twice.
Avatar of sud

ASKER

thank Adam I copied your code but i still same  message, ?
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
replace the second find call with this:

find(
    sub{
        return unless -f $File::Find::name;
        unless(exists($OrigMtime{$_})) {
            print "New file $_\n";
            return;
        }
        return if $OrigMtime{$_}==(stat($File::Find::name))[9];
        print "File $File::Find::name has been changed. \n";
  },
   $DataDirectory);
Is it $DataDirectory = "A:/temp/Data" or $DataDirectory="M:/temp/Sajin/GS-US-174-0106/Data/";?
print "New file $_",return unless exists($OrigMtime{$_});
should be
(print "New file $_"),return unless exists($OrigMtime{$_});
Avatar of sud

ASKER

Are the code working on your side ?

Avatar of sud

ASKER

Data directory is
my $DataDirectory="A:/temp/newcompare/";
&
 $DataDirectory = "A:/temp/Data" ;
Avatar of sud

ASKER

OZO,
I changed to (print "New file $_"),return unless exists($OrigMtime{$_});
but it  did not help, still same message

Did the
   print "OrigMtime{$_}=(stat($File::Find::name))[9]\n";
and
  print "OrigMtime{$_}==(stat($File::Find::name))[9]\n";
say anything?
Avatar of sud

ASKER

I placed 2 files   test1.txt and test2.txt in the data directory and 1 file in newcompare directory ,

then i get 1 message " New file test2.txt", but i also get two  messages "use of uninitialized value in numeric eq <==> at newfile.pl line 40."
Avatar of sud

ASKER

I get following message:
OrigMtime{test.txt}=(stat(A:/temp/newcompare/test.txt))[9]
use of uninitialized value in numeric eq <==> at newfile.pl line 41.
use of uninitialized value in numeric eq <==> at newfile.pl line 41."
New file test2.txt
SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Avatar of sud

ASKER

well, your code works if

I change "==" to "="

Also $File::Find::name gives the name of the file with complete path.
Avatar of sud

ASKER

Can I write
copy($OrigMtime{$_} , "A:/temp/Previous/");  ???
then
copy($File::Find::name, "A:/temp/newcompare") --> this one works.

I want to copy Original file (file in newcompare directory)  to Previous directory then changed file(in data directory)  to newcompare directory.
 
Changing == to = is changing numeric compare to assignment.
When the stat is done, the full path is needed, which is why $File::Find::name is used.
When looking at the files, only the name is needed, because the path is different, and is expected to be different.

$OrigMtime{$_} will have the last modified time, not the name of the file, so you wouldn't want to copy that.

Do you want to sync these two directories?  There are utilities that can help with that if that's what you want to do.
Avatar of sud

ASKER

Thanks,
        I am able to copy file using a different logic.  Your code & suggestion has been helpful.