Replacement for parse_url ( ) with Reqular expressions.

I need a replacement for parse_url()

I'm working with mod_rewrite under apache and need to process the $_SERVER['REQUEST_URI'] super global.

I need to split out the path from the file name. I don't need the get paramaters as they are supplied in the $_GET[] super global

I want to use a regular expression so I can filter out unwanted characters in the path / filename.

Path should be able to contain characters [0-9,a-z,/,-,_] max path length 64 characters.
The returned path should always have a trailing slash.
Filename should be able to contain characters [0-9,a-z,/,-,_] and only one [.]

I'm still a newbie with regular expressions.

I've started to write a little test script. Can some one please fill in the blanks with the uri_decode() function.

Thanks.


<?php
$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';

print_r( uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )

uri_decode( $uri )
{

 $ret['path'] = preg_replace ???????
 $ret['file'] = preg_replace ??????
 return $ret;
}
?>
Matthew_WayAsked:
Who is Participating?
 
ixtiCommented:
Well. If there is an easy way to do it without RegExps, then I prefer to not use of RegExps.
But I still use them in validations etc.

Sorry. I have forgot abut that case. So lets modify my example:
<?php
function uri_decode($uri)
{
    // First we divide given uri into 2 parts
    // ? - is the begining of GET params, so we can do it like this:
    $uri = explode("?", $uri);
   
    // Now we can be sure that we have no get params.
    $uri = $uri[0];
    preg_match("/(?P<path>[\w\/-_]{1,64})(\/(?P<file>[\w\/-_]*?[.]{1}[\w\/-_]*?))?$/", $uri, $matches);
    $ret['path'] = (isset($matches['path'])) ? $matches['path'] : null;
    $ret['file'] = (isset($matches['file'])) ? $matches['file'] : null;
    $ret['path'] = (preg_match("/.*\/$/", $ret['path'])) ? $ret['path'] : $ret['path'] . "/";
    return $ret;
}

$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';
$uri4 = '/home?k=8';

print_r( uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
print_r( uri_decode($uri4) );  // outputs array( [path] => /home/ [file]=> )
?>
0
 
ixtiCommented:
If you need to use exactly Regular Expressions. Then it may be like this:

<?php
function uri_decode($uri)
{
    preg_match("/(?P<path>[\w\/-_]{1,64})(\/(?P<file>[\w\/-_]*?[.]{1}[\w\/-_]*?))?(\?.*?)?$/", $uri, $matches);
    $ret['path'] = (isset($matches['path'])) ? $matches['path'] : null;
    $ret['file'] = (isset($matches['file'])) ? $matches['file'] : null;
    $ret['path'] = (preg_match("/.*\/$/", $ret['path'])) ? $ret['path'] : $ret['path'] . "/";
    return $ret;
}

$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';

print_r( uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
?>

But I not suggest you to do something with RegExps when can easily be done without RegExps...
Sorry for my bad english. I'm like a dog: everything understand, but can't say :))
0
 
Matthew_WayAuthor Commented:
ixti,

Thanks for your response.

Why not RegExps ? are they to slow ??

I like the idea of having tight control of the URLs to help prevent hack attempts.

Any sugestions or other examples would be welcome.

Matt
0
 
Matthew_WayAuthor Commented:
Opps,

Tried the script but it didn't work as expected.

If I have a URI of
/home?k=8

the array comes back as:
Array
(
    [path] => /home?k=8/
    [file] =>
)

See how the paramaters get joined on the path.

Can you make it so it comes back with
Array
(
    [path] => /home/
    [file] =>
)

Thanks

Matt
0
 
ixtiCommented:
And another example. Without using of RegExps:

<?php
function noregexp_uri_decode($uri)
{
    $ret = array('path' => null, 'file' => null);
    $uri = explode("?", $uri);
    $uri = explode("#", $uri[0]);
    $uri = $uri[0];
    if ($pos = strrpos($uri, ".")) {
        $pos            = strrpos($uri, "/");
        $ret['file']    = substr($uri, $pos + 1);
        $ret['path']    = substr($uri, 0, $pos + 1);
    } else {
        $ret['path']    = $uri;
    }
    $ret['path'] = "/" . trim($ret['path'], "/") . "/";
    return $ret;
}

$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';
$uri4 = '/home?k=8';

print_r( noregexp_uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( noregexp_uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( noregexp_uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
print_r( noregexp_uri_decode($uri4) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
?>
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.