Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1005
  • Last Modified:

PHP/REGEX: glob() where extension NOT .txt

$xyz=glob("*.*");

Open in new window

That places all files into an an array.  I do not want files with the extension of ".txt" to be included.
0
hankknight
Asked:
hankknight
  • 5
  • 4
  • 2
  • +2
6 Solutions
 
Ray PaseurCommented:
This will get the right result.  Not sure how to tell the limited glob() engine to do comprehensive pattern matching.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

Open in new window

0
 
Ray PaseurCommented:
The pattern on line 14 appears to work correctly, too, in my limited tests.  Not sure what it would do with a file suffix like "act" or "tmp" but I'm pretty sure the glob_not() function will screen those correctly.  Also not sure which method would be faster.  Anyway, HTH, ~Ray

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

Open in new window

0
 
gr8gonzoConsultantCommented:
If you really need a regex:

^((?!.txt).)*$
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Dan CraciunIT ConsultantCommented:
@gr8gonzo: you're sure glob supports that? I haven't tested, but the first comment on the php man page on glob says:
glob also does not support lookbehinds, lookaheads, atomic groupings, capturing, or any of the 'higher level' regex functions.
0
 
gr8gonzoConsultantCommented:
Sorry, I don't know how I missed that glob() was the OP's approach. I was thinking that the regex would be implemented in a readdir loop:

$dh = opendir("path");
while( ($file = readdir($dh)) !== false )
{
  if(preg_match("/^((?!.txt).)*\$/",basename($file)))
  {
     ... valid file code ...
  }
}
closedir($dh);

Open in new window


That was my mistake.
0
 
Dan CraciunIT ConsultantCommented:
Not really a mistake. I was just curious if you tested and that comment is wrong :)
0
 
gr8gonzoConsultantCommented:
Nope, just not paying enough attention. Ray's approach is correct.
0
 
Terry WoodsIT GuruCommented:
You can use a full regex if you use the RecursiveDirectoryIterator and RegexIterator classes.

Example here: https://github.com/cballou/PHP-SPL-Iterator-Interface-Examples/blob/master/recursive-regex-iterator.php

If you choose to do that, you'll probably want to fix the bug in @gr8gonzo's regex pattern too (the . needs to be escaped, or it will be treated as a wildcard):

^((?!\.txt).)*$

Open in new window


Even then, it would fail to behave correctly on a filename "not.really.txt.doc"

A better pattern would be:
^.*(?<!\.txt)$

Open in new window

Or possibly just:
(?<!\.txt)$

Open in new window


Personally, I'd probably just stick with a glob("*.*") and then use something like a preg_grep with the "(?<!\.txt)$" pattern (or even just a substr to get the last 4 chars to compare with .txt) to screen out the .txt files. It's easier to understand and maintain.
0
 
gr8gonzoConsultantCommented:
Yep, I also agree with Terry's assessment of my regex. :) I'm just going to be quiet now and watch from the corner.
0
 
hankknightAuthor Commented:
This seems to do the trick:
$xyz=glob('*.[!tT][!xX][!tT]');
0
 
Ray PaseurCommented:
These four tests all produce identical output in my test directories.  I would choose the function because it's more likely to be understood by other programmers who might be working on the project.  But if you have a good doc-block to explain the regular expressions, those would be fine, too.  I use a case-sensitive file system and consider a non-lower-case file extension to be an error, but if you don't ...

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

function glob_not($pat='?')
{
    return array_diff(glob('*.*'), glob($pat));
}

var_dump(glob_not('*.txt'));
var_dump(glob_not('*.[tT][xX][tT]'));

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

$pat = '*.[!tT][!xX][!tT]';
var_dump(glob($pat));

Open in new window

0
 
Ray PaseurCommented:
A little more extensive testing...

The first two examples work correctly.  The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
0
 
Terry WoodsIT GuruCommented:
The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
It may not be immediately clear as to why this happens. If I understand correctly, using the glob pattern:
*.[!tT][!xX][!tT]

Open in new window

means this:
Match any file with a 3 character extension at the end, where the first character of the 3 characters can't be a t, and the second can't be an x, and the third can't be a t.

This means the t in the filename somefile.tmp fails because the first character of the extension isn't allowed to be a t
0
 
Ray PaseurCommented:
@Terry: 'Zackly!  If you want to match patterns that are not *.txt (or any other glob() pattern), then the glob_not() function is probably the most straightforward way to do this -- if you insist on using glob().  If the author had not required glob() this is probably how I might have programmed it.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

// SEE http://php.net/manual/en/directoryiterator.construct.php

function pseudo_glob_not($pat='?', $dir=__FILE__)
{
    $dir = new DirectoryIterator(dirname($dir));
    foreach ($dir as $fobj)
    {
        if (!$fobj->isDot())
        {
            if (!$fobj->isDir())
            {
                $fn = $fobj->getFilename();
                if (!preg_match($pat, $fn))
                {
                    $out[] = $fn;
                }
            }
        }
    }
    return $out;
}

// A REGULAR EXPRESSION TO MATCH TEXT FILES
$rgx
= '#'      // REGEX DELIMITER
. '.*?'    // ANYTHING OR NOTHING
. '\.'     // ESCAPED DOT
. 'txt'    // FILE SUFFIX
. '$'      // AT END OF STRING
. '#'      // REGEX DELIMITER
. 'i'      // CASE-INSENSITIVE
;

// TEST THE ALGORITHM
$arr = pseudo_glob_not($rgx);
print_r($arr);

Open in new window

0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 4
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now