Regulary Expression for filtering redundant directory notations

Posted on 2003-11-24
Medium Priority
Last Modified: 2010-03-04

i am coding a little script that navigates through a standard file system. As I submit the path of the new directory I need a security mechanism to make sure that nobody can fake the url indexing a directory that is higher than the directory set as base directory for the script.

I need a regulary expression that filters any ../ - style notation from directory path. so that redundant directory notations are impossible!

I tried the following Perl Regulary Expression: (it's php code but should be no problem to understand for Perl coders ;-)

preg_replace( "|/(.*)/../|U", "/", $cur_dir );

The problem is that the expression fails notations that have more than one /.. e.g:  testbasedir/../../..

Question by:WebFerret
  • 4
  • 4
  • 2
  • +3

Expert Comment

ID: 9809376
Try so (perl):

Now, variable $cur_dir contain "/dir". In this part of a code all "/.." are replaced on "".
Thus I too am defended from hackers.

Expert Comment

ID: 9809382
I'm sorry.
Valid code is:
$cur_dir=~s/\/..//g;     # "g" instead of "i"

Author Comment

ID: 9809622
Damn!!! I only need the expression, not the Perl code! ;-)

Sorry, but your solution does not work in php PREG_xxx-command. ~s seems to be PERL but not part of a PERL regulary expression?!

222 if you give me a single common Perl RegExpr that works with php preg_replace( )-function (PHP uses the PERL module)!

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

LVL 51

Expert Comment

ID: 9809875

Author Comment

ID: 9810251
Your regulary expression only reduces all repeating /.. to one /.. It would shorten




But I don't want to cut the end of the string but kill the redundant directories notations from the path. I want the "true" path in this example it has to be:

/    (base directory)

as /part1/part2 and /../.. are compensating each other!

hope now it is now more clearly what I want! Is there still someone who can help me!? :-)
LVL 51

Expert Comment

ID: 9810924
what you want is not a regex but a regex-substitution:
 what do you expect for: /part1/part2/../../xx/../../..

Expert Comment

ID: 9812432
#Get rid of "/./"  to handle /./..
$cur_dir =~ s{/\./}{/}g;

#Get rid of "//"
$cur_dir =~ s{//}{/}g;

#Simplify /dir/../ patterns to /
my ($temp) = "";
while ($temp ne $cur_dir) {
    $temp = $cur_dir;
    $cur_dir =~ s{/[^./][^/]+/\.\.(/|$)}{/};  # Handle most cases
    $cur_dir =~ s{/\.[^.][^/]+/\.\.(/|$)}{/}; # Handle most cases with directories starting with "."
    $cur_dir =~ s{/\.\.[^/]+/\/.\.(/|$)}{/};   # Handle the last few cases with dirs starting with ".."

# Clean up a trailing / that may have been added
$cur_dir =~ s{/$}{};

Since you need to backtrace after each replacement, you can't use a "g" modifyer, but instead you must use a loop.

Author Comment

ID: 9812968

expecting function that generates from


the result path is


or ε (empty string!)

(deepest allowed path is the base directory "/" - not nescessarily the root directory of the filesystem but the deepest allowed directory. In my case it is the directory from which the script is called.)
LVL 28

Expert Comment

ID: 9819626
>> deepest allowed path is the base directory "/" - not nescessarily the root directory

I don't know how php "reads" path notation, but on unix systems / is the root dir and ./ is the current working directory (or "base directory").

So far, each of the solutions removes the ../ relative path notation but may leave you with a properly formatted but a non existing directory path.   Also nothing has been mentioned about the user inputting an absolute path that is higher up the dir tree than the previous path.  Following along with the relative path problem (as outlined in your last post), you could do this (using Perl syntax since I don't know php):

$cur_dir = './' if ($cur_dir =~ /\.\./);

or this:

$cur_dir =~ s#^.*?\.\./.*$#./#;

If you need to test for absolute paths, then we'll need to approach this from a slightly different angle.

Expert Comment

ID: 9819774
In Perl-speak, I'd use something like this:

my $path = '/part1/part2/../../xx/../../../';
1 while($path =~ s#(?:[^/]+/)?\.\./##);

And it php, it'd look more like this:

$path = '/part1/part2/../../xx/../../';
while($path != ($tmp = preg_replace( "#(?:[^/]+/)?\.\./#", "", $path, 1)))
        $path = $tmp;

And people say Perl isn't beautiful. :) If you have a trailing / on the path, it leaves you with that, otherwise, it ends with the empty string.
LVL 51

Expert Comment

ID: 9819920
icrf, I came up with a similar regex:
  while (s#[^/.]+[/]+\.\./##){}

but it suffers from the same problem as your sugestion, use
   $path = '/part1/part2/../../xx/../../..';

Think I need to go to bed with J.Friedl's regular ex. and a few beers ...
Probably there is something better with perl's look-ahead tomorrow ..
LVL 51

Accepted Solution

ahoffmann earned 888 total points
ID: 9820012
perl goes here:
   while (s#(?:[^/]+[/]+)+\.\./?##){}

hopefully its similar in php, see icrf's suggestion

Author Comment

ID: 9837410
ahoffmann's preg-expression works perfectly! :-)
Thank you to icrf too!

working php solution for eliminating redundant directories in pathes is:

$path = '/part1/part2/../../xx/../../..';
while($path != ($tmp = preg_replace( "#(?:[^/]+[/]+)+\.\./?#", "", $path, 1))) $path = $tmp;
echo $path;

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question