itterate over a directory

Posted on 2005-04-14
Last Modified: 2012-08-14
Windows platform
What I have: a directory called MFG that have 79 folders and these folder have sub folders.  In the folders and sub folders are files.
what I want:  in the end I need a one folder for every manufacturer with one index.txt and all pdf files.
I need to itterate through a directory structure and and list the files in an INDEX.txt and folders in a directory then MOVE the files to the BASE dir and RM the old folders.

The folowing script itterates through the dir but will not crawl into subdirs.  second, it does not output to a TXT or MOVE and delete old dirs...

this is urgent and technical. - 500 points.

(severly lacking somewhat useful base of a) script:

my $top = "C:/NETSHARE/PDF/MFG";

chdir ($top) || die "Cannot chdir to $top  ($!)";

foreach my $folder (grep { -d } glob "*") {
    print "Folder $folder =>\n";
    my $dir = "$top/$folder";
    chdir ($dir) || die "Cannot chdir to $dir  ($!)";
    foreach my $file (grep { -f } glob "*") {     # consider files only
        print "\tFile $file\n";
Question by:tweaver1973
    LVL 18

    Expert Comment

    can you tell us how we're supposed to tell which file belongs to which manufacturer?

    Author Comment


    the pdf files are specifications for items sold by each manufacturer. So, in MFG/Kohler there are some pdf's that in the top folder and there are pdf's in sub folder such as "sinks" and "fixtures" - i.e. MFG/Kohler/Sinks/foo.pdf.

    I need to put foo.pdf in MFG/Kohler and create a TXT file that reads Kohler \t  foo.pdf - like an index. ultimately, the boss wants an exel file with two columns an INDEX of the pdf's in the folder. the first column (something I will create manually) is a part number the second column is the corresponding pdf where this part is specified.

    To answer your question - if I understand it correctly, we know that foo.pdf belongs in Kohler becasue we found it in the Kohler dir or a Kohler subdir.

    Author Comment


    New and improved itteration - now I just need hep accessing this array..


    use File::Find;
    find sub { print $File::Find::name, -d && "/", "\n" }, @FOO;
    LVL 16

    Expert Comment

    use File::Find ;
    find (\&do_it, 'path/to/file') ;
    sub do_it {
    if (!-d&&/\.pdf$/) {
    my ($par_dir) = $File::Find::name =~ m/\/(.*?)\/.*?$/ ;  ##extract parent directory
    !-d "C:/MFG/$par_dir" && mkdir "C:/MFG/$par_dir" ;  ##create parent dir if it doesn't exist
    rename $File::Find::name "C:/MFG/$par_dir/$_" ;  ##move pdf to the created directory


    Author Comment

    $par_dir will exist because we found one or more PDF's in the $par_dir..
    I want to be sure I have this right - so, for clarity I have questions:

    1. This script starts with if (!-d&&/\.pdf$/) - this means if the string in $_ is a dir AND the string ends with pdf  drop into the if - what does $/ mean?

    2. first line of if reads my ($par_dir) = $File::Find::name =~ m/\/(.*?)\/.*?$/ ; here you are setting $par_dir to the string in $_  but removing all sub dirs down to C:\NETSHARE, I need the par_dir to be the base dir that I start in - :C:/NETSHARE/PDF/MFG" so, $par_dir needs to be "C:/NETSHARE/PDF/MFG"

    3. I do not understand the mkdir because in all cases the dir exists.  Maye you want me to think differently about the task I am performing and you have a better suggestion.  As I said - I need to flatten the directories and create an index.txt file.
    If you have a better suggestion please guide me.


    Author Comment

    so really $par_dir should be "C:/NETSHARE/PDF/MFG\" . "last dir found on this level" - find all pdf's - move to $par_dir, rm all other dirs - move to next dir

    then "C:/NETSHARE/PDF/MFG\" . "next dir found on this level"

    Author Comment

    need help - daylight is passing.   Original question remains unresolved.  
    manav_mathur are you still interested - I think the answer is in $par_dir but I have limited exp in regexp.  and I have tried to place parens around different areas of the expression and even tried my had at a few greedy expressions but I cannot figure out how to get the different peices of the string..
    If I capture (.*\/).*  would this mean grab everything up to the last / ??

    so on the following string:

    $1 would be:

    What I need to do is put  "foo.pdf" in FOO and then put a text file in FOO that reads "foo.pdf"
    and delete the sub directories..

    LVL 84

    Accepted Solution

    use File::Find;

    my $top = "C:/NETSHARE/PDF/MFG";

    chdir ($top) || die "Cannot chdir to $top  ($!)";

    for my $folder ( grep -d, glob"*" ){
        open INDEX,">>$folder/INDEX.TXT" or warn "open $folder/INDEX.TXT $!";
        finddepth( {no_chdir=>1,wanted=>sub{
            if( -f && /\.pdf$/ ){
                rename$_,"$folder/$name" or warn "rename $_ $!";
                print INDEX "$folder\t$name\n";
            -d && $_ ne '.' && (rmdir $_ or warn "rmdir $_ $!");
        close INDEX;
    LVL 18

    Expert Comment

    Manav was on his way with File::Find, but I suspect he's having a busy day too.
    Let me take a brief stab at it:

    1. You know that all directories immediately below $top are your vendor directories.
    2. I suggest just getting that list with readdir()
    3. then we loop over each, and use File::Find and File::Spec to handle all files below $top/$vendor recursively


    But then ozo fixed the problem. hehe!

    Author Comment

    Yes, Manav was well on the way to a solution.  I really like  File::Find and I was beginning to learn more about this package.  I feel that if I had a better handle on Regexp I would have fumbled around enough to get what I wanted from what Manav put together.  However, Ozo came up with a script that addresses my needs and is a complete solution to my question.

    I will be back with a few follow up question for cleaning upother areas around this project.  Thank you for the input.


    Featured Post

    6 Surprising Benefits of Threat Intelligence

    All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

    Join & Write a Comment

    Suggested Solutions

    On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
    There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…

    729 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    23 Experts available now in Live!

    Get 1:1 Help Now