PHP: Best way to scan folders and process files

trevor1940
trevor1940 used Ask the Experts™
on
Searching the internet there seems to be many ways of traversing directories and files

Can some one suggest what the best way to this?

Within the "root" directory there should only be sub directories the name of each is required and processed separately
within each sub directory should only be files each one needs processing according to file extension

This is a windows system so how do I deal with'.' and '..'?

This is how far I've got and isn't tested more to give you an idea of what I'm trying to do

<?PHP
error_reporting(E_ALL);

$DS = DIRECTORY_SEPARATOR;

$root ="path" . $DS . "to" . $DS . "root";

$Dirs= scandir($root);
foreach($Dirs as $dir){

//$dir name is wanted 
$dh  = opendir($dir);
while (false !== ($fileName = readdir($dh))) {
    $ext = substr($fileName, strrpos($fileName, '.') + 1);
    if($ext == "html"){
       $htmlFile = $dir . $DS . $fileName;
       // Open and Do stuff with html file
    }
    elseif(in_array($ext, array("jpg","jpeg","png"))){
        //Do other stuff
    }
    else{
     // Error code file type not needed
    }
 }
}

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2017
Distinguished Expert 2018
Commented:
Take a look at DirectoryIterator

It has the best options and performance.
Most Valuable Expert 2011
Top Expert 2016
Commented:
You can probably use scandir() or glob().  You can use the iterators, too.

All the filesystem functions are documented here:
http://php.net/manual/en/ref.filesystem.php

The iterators, here (good example):
http://php.net/manual/en/class.recursivedirectoryiterator.php#97228

Here's a sample script that does it two ways - either as an array tree, or as a file list.  For your work, you probably want the file list.  Then you don't have to consider the directory structure - each array element contains the complete file path.
<?php // demo/temp_trevor1940.php
/**
 * https://www.experts-exchange.com/questions/29008609/PHP-Best-way-to-scan-folders-and-process-files.html
 *
 * http://php.net/manual/en/splfileinfo.getpathname.php
 */
error_reporting(E_ALL);
echo '<pre>';


// COLLECT EVERYTHING IN A TREE
function dirToTree($dir)
{
    $contents = [];
    foreach (array_diff( scandir($dir), ['.', '..']) as $node)
    {
        if (is_dir($dir . DIRECTORY_SEPARATOR . $node))
        {
            $contents[$node] = dirToTree($dir . DIRECTORY_SEPARATOR . $node);
        }
        else
        {
            $contents[] = $node;
        }
    }

    return $contents;
}

$dir = getcwd();
var_dump($dir);
$tree = dirToTree($dir);
print_r($tree);


// COLLECT EVERYTHING IN A LIST
function dirToList($dir)
{
    $contents = [];
    foreach ($dir as $path)
    {
        if ($path->isDir())
        {
            dirToList($path);
        }
        else
        {
            $contents[] = (string)$path->getPathName();
        }
    }

    return $contents;
}

$dir = getcwd();
var_dump($dir);
$iterator = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir));
$list = dirToList($iterator);
print_r($list);

Open in new window

Commented:
I ran a quick test to see how fast each method would scan through a folder containing 216,265 files, skipping over the . and .. entries:

Using opendir/readdir/closedir:
Counted 216265 files in 0.18007111549377 seconds.

Using glob("*"):
Counted 216265 files in 0.54197692871094 seconds.

Using DirectoryIterator:
Counted 216265 files in 0.082073926925659 seconds.

I ran the tests multiple times, in different orders, and the results were consistent. DirectoryIterator was definitely the best-performing approach, as Julian said above.
OWASP: Avoiding Hacker Tricks

Learn to build secure applications from the mindset of the hacker and avoid being exploited.

Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
@gr8gonzo - I did the same thing a couple months back but nice to see the results confirmed.

Commented:
Yep, never had run the test myself before. :)
Most Valuable Expert 2011
Top Expert 2016

Commented:
I expect you will find it to be even faster on PHP7, however to get a consistent test, you'll need to use the same I/O subsystem.  I didn't bother to time it because the response was essentially instantaneous on my server where there are less than 100,000 files.

Author

Commented:
Hi All thanx for the difference test, interesting

  @gr8gonzo was that 1 directory containing 216265 files or a tree would it matter?

@Ray
each array element contains the complete file path.
Once I have a list of full path  files I assume there are methods for breaking it into components

Eg parent folder, Filename & extension  / file type?

Using "$ext = substr($fullPath, strrpos($fullPath, '.') + 1);"   only works if the file has an extension

Path.To/ReadMe  fails
Path.To/ReadMe.txt works
Most Valuable Expert 2011
Top Expert 2016

Commented:
a list of full path  files I assume there are methods for breaking it into components
Yes, but this seems to be a different question.
"$ext = substr($fullPath, strrpos($fullPath, '.') + 1);"   only works if the file has an extension
Yes, that makes sense.  You might want to choose the second example I posted above, using the function dirToList().  It gives full file paths all the way to the file, and does not give paths to directories.  You may want to sort the list.  And if your files do not have extensions, that is a separate issue and worthy of another question.  We will need to see your test data!

Author

Commented:
Yes, but this seems to be a different question.

OK Ray quite possibly I was just thinking beyond the current project so don't have any data

Author

Commented:
thanx for your help

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial