PHP: Best way to scan folders and process files

Searching the internet there seems to be many ways of traversing directories and files

Can some one suggest what the best way to this?

Within the "root" directory there should only be sub directories the name of each is required and processed separately
within each sub directory should only be files each one needs processing according to file extension

This is a windows system so how do I deal with'.' and '..'?

This is how far I've got and isn't tested more to give you an idea of what I'm trying to do

<?PHP
error_reporting(E_ALL);

$DS = DIRECTORY_SEPARATOR;

$root ="path" . $DS . "to" . $DS . "root";

$Dirs= scandir($root);
foreach($Dirs as $dir){

//$dir name is wanted 
$dh  = opendir($dir);
while (false !== ($fileName = readdir($dh))) {
    $ext = substr($fileName, strrpos($fileName, '.') + 1);
    if($ext == "html"){
       $htmlFile = $dir . $DS . $fileName;
       // Open and Do stuff with html file
    }
    elseif(in_array($ext, array("jpg","jpeg","png"))){
        //Do other stuff
    }
    else{
     // Error code file type not needed
    }
 }
}

Open in new window

LVL 1
trevor1940Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Julian HansenCommented:
Take a look at DirectoryIterator

It has the best options and performance.
1
Ray PaseurCommented:
You can probably use scandir() or glob().  You can use the iterators, too.

All the filesystem functions are documented here:
http://php.net/manual/en/ref.filesystem.php

The iterators, here (good example):
http://php.net/manual/en/class.recursivedirectoryiterator.php#97228

Here's a sample script that does it two ways - either as an array tree, or as a file list.  For your work, you probably want the file list.  Then you don't have to consider the directory structure - each array element contains the complete file path.
<?php // demo/temp_trevor1940.php
/**
 * https://www.experts-exchange.com/questions/29008609/PHP-Best-way-to-scan-folders-and-process-files.html
 *
 * http://php.net/manual/en/splfileinfo.getpathname.php
 */
error_reporting(E_ALL);
echo '<pre>';


// COLLECT EVERYTHING IN A TREE
function dirToTree($dir)
{
    $contents = [];
    foreach (array_diff( scandir($dir), ['.', '..']) as $node)
    {
        if (is_dir($dir . DIRECTORY_SEPARATOR . $node))
        {
            $contents[$node] = dirToTree($dir . DIRECTORY_SEPARATOR . $node);
        }
        else
        {
            $contents[] = $node;
        }
    }

    return $contents;
}

$dir = getcwd();
var_dump($dir);
$tree = dirToTree($dir);
print_r($tree);


// COLLECT EVERYTHING IN A LIST
function dirToList($dir)
{
    $contents = [];
    foreach ($dir as $path)
    {
        if ($path->isDir())
        {
            dirToList($path);
        }
        else
        {
            $contents[] = (string)$path->getPathName();
        }
    }

    return $contents;
}

$dir = getcwd();
var_dump($dir);
$iterator = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir));
$list = dirToList($iterator);
print_r($list);

Open in new window

1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gr8gonzoConsultantCommented:
I ran a quick test to see how fast each method would scan through a folder containing 216,265 files, skipping over the . and .. entries:

Using opendir/readdir/closedir:
Counted 216265 files in 0.18007111549377 seconds.

Using glob("*"):
Counted 216265 files in 0.54197692871094 seconds.

Using DirectoryIterator:
Counted 216265 files in 0.082073926925659 seconds.

I ran the tests multiple times, in different orders, and the results were consistent. DirectoryIterator was definitely the best-performing approach, as Julian said above.
2
OWASP: Avoiding Hacker Tricks

Learn to build secure applications from the mindset of the hacker and avoid being exploited.

Julian HansenCommented:
@gr8gonzo - I did the same thing a couple months back but nice to see the results confirmed.
0
gr8gonzoConsultantCommented:
Yep, never had run the test myself before. :)
0
Ray PaseurCommented:
I expect you will find it to be even faster on PHP7, however to get a consistent test, you'll need to use the same I/O subsystem.  I didn't bother to time it because the response was essentially instantaneous on my server where there are less than 100,000 files.
0
trevor1940Author Commented:
Hi All thanx for the difference test, interesting

  @gr8gonzo was that 1 directory containing 216265 files or a tree would it matter?

@Ray
each array element contains the complete file path.
Once I have a list of full path  files I assume there are methods for breaking it into components

Eg parent folder, Filename & extension  / file type?

Using "$ext = substr($fullPath, strrpos($fullPath, '.') + 1);"   only works if the file has an extension

Path.To/ReadMe  fails
Path.To/ReadMe.txt works
0
Ray PaseurCommented:
a list of full path  files I assume there are methods for breaking it into components
Yes, but this seems to be a different question.
"$ext = substr($fullPath, strrpos($fullPath, '.') + 1);"   only works if the file has an extension
Yes, that makes sense.  You might want to choose the second example I posted above, using the function dirToList().  It gives full file paths all the way to the file, and does not give paths to directories.  You may want to sort the list.  And if your files do not have extensions, that is a separate issue and worthy of another question.  We will need to see your test data!
0
trevor1940Author Commented:
Yes, but this seems to be a different question.

OK Ray quite possibly I was just thinking beyond the current project so don't have any data
0
trevor1940Author Commented:
thanx for your help
1
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.