Solved

PHP/REGEX: glob() where extension NOT .txt

Posted on 2014-04-07
14
846 Views
Last Modified: 2014-04-23
$xyz=glob("*.*");

Open in new window

That places all files into an an array.  I do not want files with the extension of ".txt" to be included.
0
Comment
Question by:hankknight
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
  • +2
14 Comments
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39983295
This will get the right result.  Not sure how to tell the limited glob() engine to do comprehensive pattern matching.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

Open in new window

0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39983345
The pattern on line 14 appears to work correctly, too, in my limited tests.  Not sure what it would do with a file suffix like "act" or "tmp" but I'm pretty sure the glob_not() function will screen those correctly.  Also not sure which method would be faster.  Anyway, HTH, ~Ray

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

Open in new window

0
 
LVL 35

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
ID: 39983702
If you really need a regex:

^((?!.txt).)*$
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 60 total points
ID: 39983934
@gr8gonzo: you're sure glob supports that? I haven't tested, but the first comment on the php man page on glob says:
glob also does not support lookbehinds, lookaheads, atomic groupings, capturing, or any of the 'higher level' regex functions.
0
 
LVL 35

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
ID: 39983971
Sorry, I don't know how I missed that glob() was the OP's approach. I was thinking that the regex would be implemented in a readdir loop:

$dh = opendir("path");
while( ($file = readdir($dh)) !== false )
{
  if(preg_match("/^((?!.txt).)*\$/",basename($file)))
  {
     ... valid file code ...
  }
}
closedir($dh);

Open in new window


That was my mistake.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39983977
Not really a mistake. I was just curious if you tested and that comment is wrong :)
0
 
LVL 35

Expert Comment

by:gr8gonzo
ID: 39983996
Nope, just not paying enough attention. Ray's approach is correct.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39984548
You can use a full regex if you use the RecursiveDirectoryIterator and RegexIterator classes.

Example here: https://github.com/cballou/PHP-SPL-Iterator-Interface-Examples/blob/master/recursive-regex-iterator.php

If you choose to do that, you'll probably want to fix the bug in @gr8gonzo's regex pattern too (the . needs to be escaped, or it will be treated as a wildcard):

^((?!\.txt).)*$

Open in new window


Even then, it would fail to behave correctly on a filename "not.really.txt.doc"

A better pattern would be:
^.*(?<!\.txt)$

Open in new window

Or possibly just:
(?<!\.txt)$

Open in new window


Personally, I'd probably just stick with a glob("*.*") and then use something like a preg_grep with the "(?<!\.txt)$" pattern (or even just a substr to get the last 4 chars to compare with .txt) to screen out the .txt files. It's easier to understand and maintain.
0
 
LVL 35

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
ID: 39984691
Yep, I also agree with Terry's assessment of my regex. :) I'm just going to be quiet now and watch from the corner.
0
 
LVL 16

Author Comment

by:hankknight
ID: 39985838
This seems to do the trick:
$xyz=glob('*.[!tT][!xX][!tT]');
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39985951
These four tests all produce identical output in my test directories.  I would choose the function because it's more likely to be understood by other programmers who might be working on the project.  But if you have a good doc-block to explain the regular expressions, those would be fine, too.  I use a case-sensitive file system and consider a non-lower-case file extension to be an error, but if you don't ...

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

function glob_not($pat='?')
{
    return array_diff(glob('*.*'), glob($pat));
}

var_dump(glob_not('*.txt'));
var_dump(glob_not('*.[tT][xX][tT]'));

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

$pat = '*.[!tT][!xX][!tT]';
var_dump(glob($pat));

Open in new window

0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39986004
A little more extensive testing...

The first two examples work correctly.  The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 60 total points
ID: 39987258
The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
It may not be immediately clear as to why this happens. If I understand correctly, using the glob pattern:
*.[!tT][!xX][!tT]

Open in new window

means this:
Match any file with a 3 character extension at the end, where the first character of the 3 characters can't be a t, and the second can't be an x, and the third can't be a t.

This means the t in the filename somefile.tmp fails because the first character of the extension isn't allowed to be a t
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 200 total points
ID: 39987310
@Terry: 'Zackly!  If you want to match patterns that are not *.txt (or any other glob() pattern), then the glob_not() function is probably the most straightforward way to do this -- if you insist on using glob().  If the author had not required glob() this is probably how I might have programmed it.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

// SEE http://php.net/manual/en/directoryiterator.construct.php

function pseudo_glob_not($pat='?', $dir=__FILE__)
{
    $dir = new DirectoryIterator(dirname($dir));
    foreach ($dir as $fobj)
    {
        if (!$fobj->isDot())
        {
            if (!$fobj->isDir())
            {
                $fn = $fobj->getFilename();
                if (!preg_match($pat, $fn))
                {
                    $out[] = $fn;
                }
            }
        }
    }
    return $out;
}

// A REGULAR EXPRESSION TO MATCH TEXT FILES
$rgx
= '#'      // REGEX DELIMITER
. '.*?'    // ANYTHING OR NOTHING
. '\.'     // ESCAPED DOT
. 'txt'    // FILE SUFFIX
. '$'      // AT END OF STRING
. '#'      // REGEX DELIMITER
. 'i'      // CASE-INSENSITIVE
;

// TEST THE ALGORITHM
$arr = pseudo_glob_not($rgx);
print_r($arr);

Open in new window

0

Featured Post

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
This article discusses how to create an extensible mechanism for linked drop downs.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question