Regulary Expression for filtering redundant directory notations

Posted on 2003-11-24
Last Modified: 2010-03-04

i am coding a little script that navigates through a standard file system. As I submit the path of the new directory I need a security mechanism to make sure that nobody can fake the url indexing a directory that is higher than the directory set as base directory for the script.

I need a regulary expression that filters any ../ - style notation from directory path. so that redundant directory notations are impossible!

I tried the following Perl Regulary Expression: (it's php code but should be no problem to understand for Perl coders ;-)

preg_replace( "|/(.*)/../|U", "/", $cur_dir );

The problem is that the expression fails notations that have more than one /.. e.g:  testbasedir/../../..

Question by:WebFerret
  • 4
  • 4
  • 2
  • +3

Expert Comment

ID: 9809376
Try so (perl):

Now, variable $cur_dir contain "/dir". In this part of a code all "/.." are replaced on "".
Thus I too am defended from hackers.

Expert Comment

ID: 9809382
I'm sorry.
Valid code is:
$cur_dir=~s/\/..//g;     # "g" instead of "i"

Author Comment

ID: 9809622
Damn!!! I only need the expression, not the Perl code! ;-)

Sorry, but your solution does not work in php PREG_xxx-command. ~s seems to be PERL but not part of a PERL regulary expression?!

222 if you give me a single common Perl RegExpr that works with php preg_replace( )-function (PHP uses the PERL module)!
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

LVL 51

Expert Comment

ID: 9809875

Author Comment

ID: 9810251
Your regulary expression only reduces all repeating /.. to one /.. It would shorten




But I don't want to cut the end of the string but kill the redundant directories notations from the path. I want the "true" path in this example it has to be:

/    (base directory)

as /part1/part2 and /../.. are compensating each other!

hope now it is now more clearly what I want! Is there still someone who can help me!? :-)
LVL 51

Expert Comment

ID: 9810924
what you want is not a regex but a regex-substitution:
 what do you expect for: /part1/part2/../../xx/../../..

Expert Comment

ID: 9812432
#Get rid of "/./"  to handle /./..
$cur_dir =~ s{/\./}{/}g;

#Get rid of "//"
$cur_dir =~ s{//}{/}g;

#Simplify /dir/../ patterns to /
my ($temp) = "";
while ($temp ne $cur_dir) {
    $temp = $cur_dir;
    $cur_dir =~ s{/[^./][^/]+/\.\.(/|$)}{/};  # Handle most cases
    $cur_dir =~ s{/\.[^.][^/]+/\.\.(/|$)}{/}; # Handle most cases with directories starting with "."
    $cur_dir =~ s{/\.\.[^/]+/\/.\.(/|$)}{/};   # Handle the last few cases with dirs starting with ".."

# Clean up a trailing / that may have been added
$cur_dir =~ s{/$}{};

Since you need to backtrace after each replacement, you can't use a "g" modifyer, but instead you must use a loop.

Author Comment

ID: 9812968

expecting function that generates from


the result path is


or ε (empty string!)

(deepest allowed path is the base directory "/" - not nescessarily the root directory of the filesystem but the deepest allowed directory. In my case it is the directory from which the script is called.)
LVL 28

Expert Comment

ID: 9819626
>> deepest allowed path is the base directory "/" - not nescessarily the root directory

I don't know how php "reads" path notation, but on unix systems / is the root dir and ./ is the current working directory (or "base directory").

So far, each of the solutions removes the ../ relative path notation but may leave you with a properly formatted but a non existing directory path.   Also nothing has been mentioned about the user inputting an absolute path that is higher up the dir tree than the previous path.  Following along with the relative path problem (as outlined in your last post), you could do this (using Perl syntax since I don't know php):

$cur_dir = './' if ($cur_dir =~ /\.\./);

or this:

$cur_dir =~ s#^.*?\.\./.*$#./#;

If you need to test for absolute paths, then we'll need to approach this from a slightly different angle.

Expert Comment

ID: 9819774
In Perl-speak, I'd use something like this:

my $path = '/part1/part2/../../xx/../../../';
1 while($path =~ s#(?:[^/]+/)?\.\./##);

And it php, it'd look more like this:

$path = '/part1/part2/../../xx/../../';
while($path != ($tmp = preg_replace( "#(?:[^/]+/)?\.\./#", "", $path, 1)))
        $path = $tmp;

And people say Perl isn't beautiful. :) If you have a trailing / on the path, it leaves you with that, otherwise, it ends with the empty string.
LVL 51

Expert Comment

ID: 9819920
icrf, I came up with a similar regex:
  while (s#[^/.]+[/]+\.\./##){}

but it suffers from the same problem as your sugestion, use
   $path = '/part1/part2/../../xx/../../..';

Think I need to go to bed with J.Friedl's regular ex. and a few beers ...
Probably there is something better with perl's look-ahead tomorrow ..
LVL 51

Accepted Solution

ahoffmann earned 222 total points
ID: 9820012
perl goes here:
   while (s#(?:[^/]+[/]+)+\.\./?##){}

hopefully its similar in php, see icrf's suggestion

Author Comment

ID: 9837410
ahoffmann's preg-expression works perfectly! :-)
Thank you to icrf too!

working php solution for eliminating redundant directories in pathes is:

$path = '/part1/part2/../../xx/../../..';
while($path != ($tmp = preg_replace( "#(?:[^/]+[/]+)+\.\./?#", "", $path, 1))) $path = $tmp;
echo $path;

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now