Solved

PHP/REGEX: glob() where extension NOT .txt

Posted on 2014-04-07
14
810 Views
Last Modified: 2014-04-23
$xyz=glob("*.*");

Open in new window

That places all files into an an array.  I do not want files with the extension of ".txt" to be included.
0
Comment
Question by:hankknight
  • 5
  • 4
  • 2
  • +2
14 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39983295
This will get the right result.  Not sure how to tell the limited glob() engine to do comprehensive pattern matching.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

Open in new window

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39983345
The pattern on line 14 appears to work correctly, too, in my limited tests.  Not sure what it would do with a file suffix like "act" or "tmp" but I'm pretty sure the glob_not() function will screen those correctly.  Also not sure which method would be faster.  Anyway, HTH, ~Ray

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

Open in new window

0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
ID: 39983702
If you really need a regex:

^((?!.txt).)*$
0
Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 60 total points
ID: 39983934
@gr8gonzo: you're sure glob supports that? I haven't tested, but the first comment on the php man page on glob says:
glob also does not support lookbehinds, lookaheads, atomic groupings, capturing, or any of the 'higher level' regex functions.
0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
ID: 39983971
Sorry, I don't know how I missed that glob() was the OP's approach. I was thinking that the regex would be implemented in a readdir loop:

$dh = opendir("path");
while( ($file = readdir($dh)) !== false )
{
  if(preg_match("/^((?!.txt).)*\$/",basename($file)))
  {
     ... valid file code ...
  }
}
closedir($dh);

Open in new window


That was my mistake.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39983977
Not really a mistake. I was just curious if you tested and that comment is wrong :)
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 39983996
Nope, just not paying enough attention. Ray's approach is correct.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39984548
You can use a full regex if you use the RecursiveDirectoryIterator and RegexIterator classes.

Example here: https://github.com/cballou/PHP-SPL-Iterator-Interface-Examples/blob/master/recursive-regex-iterator.php

If you choose to do that, you'll probably want to fix the bug in @gr8gonzo's regex pattern too (the . needs to be escaped, or it will be treated as a wildcard):

^((?!\.txt).)*$

Open in new window


Even then, it would fail to behave correctly on a filename "not.really.txt.doc"

A better pattern would be:
^.*(?<!\.txt)$

Open in new window

Or possibly just:
(?<!\.txt)$

Open in new window


Personally, I'd probably just stick with a glob("*.*") and then use something like a preg_grep with the "(?<!\.txt)$" pattern (or even just a substr to get the last 4 chars to compare with .txt) to screen out the .txt files. It's easier to understand and maintain.
0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
ID: 39984691
Yep, I also agree with Terry's assessment of my regex. :) I'm just going to be quiet now and watch from the corner.
0
 
LVL 16

Author Comment

by:hankknight
ID: 39985838
This seems to do the trick:
$xyz=glob('*.[!tT][!xX][!tT]');
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39985951
These four tests all produce identical output in my test directories.  I would choose the function because it's more likely to be understood by other programmers who might be working on the project.  But if you have a good doc-block to explain the regular expressions, those would be fine, too.  I use a case-sensitive file system and consider a non-lower-case file extension to be an error, but if you don't ...

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

function glob_not($pat='?')
{
    return array_diff(glob('*.*'), glob($pat));
}

var_dump(glob_not('*.txt'));
var_dump(glob_not('*.[tT][xX][tT]'));

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

$pat = '*.[!tT][!xX][!tT]';
var_dump(glob($pat));

Open in new window

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39986004
A little more extensive testing...

The first two examples work correctly.  The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 60 total points
ID: 39987258
The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
It may not be immediately clear as to why this happens. If I understand correctly, using the glob pattern:
*.[!tT][!xX][!tT]

Open in new window

means this:
Match any file with a 3 character extension at the end, where the first character of the 3 characters can't be a t, and the second can't be an x, and the third can't be a t.

This means the t in the filename somefile.tmp fails because the first character of the extension isn't allowed to be a t
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 200 total points
ID: 39987310
@Terry: 'Zackly!  If you want to match patterns that are not *.txt (or any other glob() pattern), then the glob_not() function is probably the most straightforward way to do this -- if you insist on using glob().  If the author had not required glob() this is probably how I might have programmed it.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

// SEE http://php.net/manual/en/directoryiterator.construct.php

function pseudo_glob_not($pat='?', $dir=__FILE__)
{
    $dir = new DirectoryIterator(dirname($dir));
    foreach ($dir as $fobj)
    {
        if (!$fobj->isDot())
        {
            if (!$fobj->isDir())
            {
                $fn = $fobj->getFilename();
                if (!preg_match($pat, $fn))
                {
                    $out[] = $fn;
                }
            }
        }
    }
    return $out;
}

// A REGULAR EXPRESSION TO MATCH TEXT FILES
$rgx
= '#'      // REGEX DELIMITER
. '.*?'    // ANYTHING OR NOTHING
. '\.'     // ESCAPED DOT
. 'txt'    // FILE SUFFIX
. '$'      // AT END OF STRING
. '#'      // REGEX DELIMITER
. 'i'      // CASE-INSENSITIVE
;

// TEST THE ALGORITHM
$arr = pseudo_glob_not($rgx);
print_r($arr);

Open in new window

0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question