Locating Duplicate filenames

On our Sun Solaris server, I need a perl program to find all duplicate filenames on a file system.  I want just the first characters up the first "." (dot) in the search.  We may have some files by the same name but one of them may be compressed; (.Z) in another directory.  I also need to be able to exclude some directories from the search for which i know we would have duplicate filenames there.
j_kAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

alexbikCommented:
Hi,

This doesn't seem a 'real' question to me, it's more like 'could anybody write a script for me'. Be more specific please..

Alex
0
j_kAuthor Commented:
Honestly, yes i was looking for a script to do this!  If not, can you provide some assistance in developing it myself. Thanks.

0
alexbikCommented:
Hi,

I think you should put the names of all files in a list, like:

@files=`find <arguments>`

With this, you can specify the dir's you want to search in. @files will contain a list with all files under the specified directory. Now you can make a loop, and process all files one by one:

for $path (@files) {
    .... code ...
}

In the 'code' part, you have to get the filename from the whole path:
$file=$path; $file=~/.*\/(.+?)/;

after this, $1 will contain the name of the file. With another regexp you can strip everything after the dot:
$beforedot=$file; $beforedot=~/.*(\.)/;

If you have the filename, you can create another loop, which tests the
name you found against all files in the @files variable.

Note that I didn't actually write this script, I didn't test the regexps, so they may need
some ajustments. It should point you in the right direction however.


0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

j_kAuthor Commented:
The expression /.*\/(.+?)/ does not pull out the filename from the whole path. I understand the;
. means any character
* matches zero or more times
but the rest i am not sure what it is doing, can you explain?

0
alexbikCommented:
Hi,

I made a mistake whith the regexp indeed.. The following example works (at least on my linuxbox):

#!/usr/bin/perl
@files=`find /`;
for $path (@files) {
        chop $path;
        $file=$path ; $file=~/.*\/(.*)/;
        print "$1\n";
}

A "." in a regexp indeed means "any character", the "*" means "repeated as many times as necesary (sp?). the "\" escapes the "/", which cannot be used bare, since it is a special character. The following .* should be clear, the () are used to put that part of the string found in $1.

Alex
0
ozoCommented:
/ isn't special, you could have said
  $file=~m".*/(.*)";

then to check duplicates,
  print "$1\n" if( $seen{$1}++ };

Or if you wanted just the first characters up the first "." (dot) in the search,
   $file=~m".*/([^\.]+)";
0
j_kAuthor Commented:
ozo,

print "$1\n" if( $seen{$1}++ );  gives me a warning
Identifier "main::seen" used only once: possible typo

Is this a debugging message i can turn off?

0
ozoCommented:
if you
  use diagnostics;
it should give you a more complete explaination of how to avoid the warning;

I'd suggest initializing %seen to empty with
  %seen = ();
at the beginning of the program.

Do you need help with excluding directorys in perl, or do you just want to let `find` handle that?

you could also use pfind:
pfind / 'print "$1\n" if m"([^.]+)" && $seen{$1}++ == 1'

0
j_kAuthor Commented:
ozo,
That was going to be my next question!  I am finding out that there are quite a few directories that i want to exclude.  And i was attempting to use find to do that.  But now im thinking i would want find to get all files and then remove records from the list by some search patterns, and then check for duplicates with the modified list.  the searching/removeing from the list would have to be done on the whole path names list.  Is this where the perl function grep could come in handy?

0
ozoCommented:
If you have a recent version of pfind, you might try something like
pfind / 'BEGIN{%xd=map{($_,1)}qw(/excludeme /exclude/me/too))}' '!$xd{$dir}' '/([^.]+)/ && $seen{$1}++ == 1'

or
find / \( -type d \( -name 'excludeme' -o -name 'metoo' \) -prune  \) -o -print | perl -ne 'push @{$seen{$1}},$_ if m".*/([^.]+)"; END{ for( values(%seen)){ print "@{$_}\n" if @{$_} > 1 } }'

which lists all instances of repeated names

0
j_kAuthor Commented:
ozo,
I decided to use find to narrow my search.  I am having trouble with the syntax.
So far, the following works

find /dir/?????/{dir1,dir2,dir3} -type f -name "*.dwg*" -print

but i also want to exclude the directories "coord" and "area" from printing.  Ive tried

find /dir/?????/{dir1,dir2,dir3} -type f -name "*.dwg*" -type d \( -name 'coord' -o -name 'area' \) -prune -print

What am i doing wrong?

0
ozoCommented:
Sorry, I forgot this question was still open.
this seems to be getting into a Unix Programming Topic Area question,
and I don't know if all versions of find handle -prune the same way, but

find /dir/?????/{dir1,dir2,dir3} \( -type d \( -name 'coord' -o -name 'area' \) -prune \) -o -type f -name "*.dwg*" -print

seems to work for me.
 
0
j_kAuthor Commented:
ozo, Back to perl stuff, In the following code:

while ( <FILES>){
      push @{$seen{$1}}, $_ if m".*/([^.]+)";
      for( values(%seen)){
            print "@{$_}\n" if @{$_} > 1;
      }
}

When there is a match and something to print,  it will print the previous match again until another match is found, then it prints that match again and again until another match is found or EOF.

0
ozoCommented:
while ( <FILES> ){
    push @{$seen{$1}}, $_ if m".*/([^.]+)";
}
for( values(%seen)){
    print "@{$_}\n" if @{$_} > 1;
}

0
ozoCommented:
And while we're back to perl stuff,
find2perl / \( -type d \( -name 'coord' -o -name 'area' \) -prune \) -o -type f -name "*.dwg*" -print
produces
  sub wanted {
    (
        (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
        -d _ &&
        (
            /^coord$/
            ||
            /^area$/
        ) &&
        ($prune = 1)
    )
    ||
    ($nlink || (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_))) &&
    -f _ &&
    /^.*\.dwg.*$/ &&
    print("$name\n");
  }

see
perldoc File::Find
0
j_kAuthor Commented:
Ok, I'm getting close to the end, Here's what i have now.  The last thing i'm stuck on is how to step through the array "@files" in a while loop!


@files=`find . -type f -print`;
@files = grep (!/00000|dwf|x2a/, @files);
@files = grep (/\d{5}\w{3}.*/, @files);

while ( ??????? ){
      push @{$seen{$1}}, $_ if m".*/([^.]+)";
}

for( values(%seen)){
      print "@{$_}\n" if @{$_} > 1;
}
0
ozoCommented:
foreach( @files ){
   push @{$seen{$1}}, $_ if m".*/([^.]+)";
}
0
j_kAuthor Commented:
Within Perl, how would i mail the output to the user j_k?, or would it be best to redirect the output of the program (piping it through mail).

such as;
# find_dup_files.pl | mail j_k
0
ozoCommented:
That should work.
You could also redirect the output within perl,

open(OUTPUT,"|mail j_k");
print OUTPUT "@{$_}\n" or die "couldn't output $!";



0
j_kAuthor Commented:
That's it!, It's all working great!, Thanks
How do i close this thread?
0
ozoCommented:
You can grade the answer, or, if you're not happy with the answer,
you can reject it and request another.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
j_kAuthor Commented:
Very Helpful,
This was a long drawn out question, but ozo was patient and responsive.
Thanks

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.