Solved

PHP/REGEX: glob() where extension NOT .txt

Posted on 2014-04-07
14
754 Views
Last Modified: 2014-04-23
$xyz=glob("*.*");

Open in new window

That places all files into an an array.  I do not want files with the extension of ".txt" to be included.
0
Comment
Question by:hankknight
  • 5
  • 4
  • 2
  • +2
14 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
This will get the right result.  Not sure how to tell the limited glob() engine to do comprehensive pattern matching.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
The pattern on line 14 appears to work correctly, too, in my limited tests.  Not sure what it would do with a file suffix like "act" or "tmp" but I'm pretty sure the glob_not() function will screen those correctly.  Also not sure which method would be faster.  Anyway, HTH, ~Ray

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);

function glob_not($pat='*.txt')
{
    $all = glob('*.*');
    $arr = glob($pat);
    return array_diff($all, $arr);
}

$dif = glob_not();
var_dump($dif);

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

Open in new window

0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
Comment Utility
If you really need a regex:

^((?!.txt).)*$
0
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 60 total points
Comment Utility
@gr8gonzo: you're sure glob supports that? I haven't tested, but the first comment on the php man page on glob says:
glob also does not support lookbehinds, lookaheads, atomic groupings, capturing, or any of the 'higher level' regex functions.
0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
Comment Utility
Sorry, I don't know how I missed that glob() was the OP's approach. I was thinking that the regex would be implemented in a readdir loop:

$dh = opendir("path");
while( ($file = readdir($dh)) !== false )
{
  if(preg_match("/^((?!.txt).)*\$/",basename($file)))
  {
     ... valid file code ...
  }
}
closedir($dh);

Open in new window


That was my mistake.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Not really a mistake. I was just curious if you tested and that comment is wrong :)
0
 
LVL 34

Expert Comment

by:gr8gonzo
Comment Utility
Nope, just not paying enough attention. Ray's approach is correct.
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
You can use a full regex if you use the RecursiveDirectoryIterator and RegexIterator classes.

Example here: https://github.com/cballou/PHP-SPL-Iterator-Interface-Examples/blob/master/recursive-regex-iterator.php

If you choose to do that, you'll probably want to fix the bug in @gr8gonzo's regex pattern too (the . needs to be escaped, or it will be treated as a wildcard):

^((?!\.txt).)*$

Open in new window


Even then, it would fail to behave correctly on a filename "not.really.txt.doc"

A better pattern would be:
^.*(?<!\.txt)$

Open in new window

Or possibly just:
(?<!\.txt)$

Open in new window


Personally, I'd probably just stick with a glob("*.*") and then use something like a preg_grep with the "(?<!\.txt)$" pattern (or even just a substr to get the last 4 chars to compare with .txt) to screen out the .txt files. It's easier to understand and maintain.
0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 180 total points
Comment Utility
Yep, I also agree with Terry's assessment of my regex. :) I'm just going to be quiet now and watch from the corner.
0
 
LVL 16

Author Comment

by:hankknight
Comment Utility
This seems to do the trick:
$xyz=glob('*.[!tT][!xX][!tT]');
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
These four tests all produce identical output in my test directories.  I would choose the function because it's more likely to be understood by other programmers who might be working on the project.  But if you have a good doc-block to explain the regular expressions, those would be fine, too.  I use a case-sensitive file system and consider a non-lower-case file extension to be an error, but if you don't ...

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

function glob_not($pat='?')
{
    return array_diff(glob('*.*'), glob($pat));
}

var_dump(glob_not('*.txt'));
var_dump(glob_not('*.[tT][xX][tT]'));

$pat = '*.{?,??,[!t][!x][!t]*}';
var_dump(glob($pat, GLOB_BRACE));

$pat = '*.[!tT][!xX][!tT]';
var_dump(glob($pat));

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
A little more extensive testing...

The first two examples work correctly.  The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 60 total points
Comment Utility
The last two examples incorrectly exclude files named like .tmp and .act, probably because the regular expression matcher only needs to find one match in order to return a "truthy" indicator.
It may not be immediately clear as to why this happens. If I understand correctly, using the glob pattern:
*.[!tT][!xX][!tT]

Open in new window

means this:
Match any file with a 3 character extension at the end, where the first character of the 3 characters can't be a t, and the second can't be an x, and the third can't be a t.

This means the t in the filename somefile.tmp fails because the first character of the extension isn't allowed to be a t
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 200 total points
Comment Utility
@Terry: 'Zackly!  If you want to match patterns that are not *.txt (or any other glob() pattern), then the glob_not() function is probably the most straightforward way to do this -- if you insist on using glob().  If the author had not required glob() this is probably how I might have programmed it.

<?php // EE_Adviser/temp_hankknight.php
error_reporting(E_ALL);
echo '<pre>';

// SEE http://php.net/manual/en/directoryiterator.construct.php

function pseudo_glob_not($pat='?', $dir=__FILE__)
{
    $dir = new DirectoryIterator(dirname($dir));
    foreach ($dir as $fobj)
    {
        if (!$fobj->isDot())
        {
            if (!$fobj->isDir())
            {
                $fn = $fobj->getFilename();
                if (!preg_match($pat, $fn))
                {
                    $out[] = $fn;
                }
            }
        }
    }
    return $out;
}

// A REGULAR EXPRESSION TO MATCH TEXT FILES
$rgx
= '#'      // REGEX DELIMITER
. '.*?'    // ANYTHING OR NOTHING
. '\.'     // ESCAPED DOT
. 'txt'    // FILE SUFFIX
. '$'      // AT END OF STRING
. '#'      // REGEX DELIMITER
. 'i'      // CASE-INSENSITIVE
;

// TEST THE ALGORITHM
$arr = pseudo_glob_not($rgx);
print_r($arr);

Open in new window

0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Consider the following scenario: You are working on a website and make something great - something that lets the server work with information submitted by your users. This could be anything, from a simple guestbook to a e-Money solution. But what…
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now