?
Solved

Replacement for parse_url ( ) with Reqular expressions.

Posted on 2006-05-03
5
Medium Priority
?
388 Views
Last Modified: 2012-06-21
I need a replacement for parse_url()

I'm working with mod_rewrite under apache and need to process the $_SERVER['REQUEST_URI'] super global.

I need to split out the path from the file name. I don't need the get paramaters as they are supplied in the $_GET[] super global

I want to use a regular expression so I can filter out unwanted characters in the path / filename.

Path should be able to contain characters [0-9,a-z,/,-,_] max path length 64 characters.
The returned path should always have a trailing slash.
Filename should be able to contain characters [0-9,a-z,/,-,_] and only one [.]

I'm still a newbie with regular expressions.

I've started to write a little test script. Can some one please fill in the blanks with the uri_decode() function.

Thanks.


<?php
$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';

print_r( uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )

uri_decode( $uri )
{

 $ret['path'] = preg_replace ???????
 $ret['file'] = preg_replace ??????
 return $ret;
}
?>
0
Comment
Question by:Matthew_Way
  • 3
  • 2
5 Comments
 
LVL 6

Expert Comment

by:ixti
ID: 16602924
If you need to use exactly Regular Expressions. Then it may be like this:

<?php
function uri_decode($uri)
{
    preg_match("/(?P<path>[\w\/-_]{1,64})(\/(?P<file>[\w\/-_]*?[.]{1}[\w\/-_]*?))?(\?.*?)?$/", $uri, $matches);
    $ret['path'] = (isset($matches['path'])) ? $matches['path'] : null;
    $ret['file'] = (isset($matches['file'])) ? $matches['file'] : null;
    $ret['path'] = (preg_match("/.*\/$/", $ret['path'])) ? $ret['path'] : $ret['path'] . "/";
    return $ret;
}

$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';

print_r( uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
?>

But I not suggest you to do something with RegExps when can easily be done without RegExps...
Sorry for my bad english. I'm like a dog: everything understand, but can't say :))
0
 

Author Comment

by:Matthew_Way
ID: 16609927
ixti,

Thanks for your response.

Why not RegExps ? are they to slow ??

I like the idea of having tight control of the URLs to help prevent hack attempts.

Any sugestions or other examples would be welcome.

Matt
0
 

Author Comment

by:Matthew_Way
ID: 16610168
Opps,

Tried the script but it didn't work as expected.

If I have a URI of
/home?k=8

the array comes back as:
Array
(
    [path] => /home?k=8/
    [file] =>
)

See how the paramaters get joined on the path.

Can you make it so it comes back with
Array
(
    [path] => /home/
    [file] =>
)

Thanks

Matt
0
 
LVL 6

Accepted Solution

by:
ixti earned 2000 total points
ID: 16612467
Well. If there is an easy way to do it without RegExps, then I prefer to not use of RegExps.
But I still use them in validations etc.

Sorry. I have forgot abut that case. So lets modify my example:
<?php
function uri_decode($uri)
{
    // First we divide given uri into 2 parts
    // ? - is the begining of GET params, so we can do it like this:
    $uri = explode("?", $uri);
   
    // Now we can be sure that we have no get params.
    $uri = $uri[0];
    preg_match("/(?P<path>[\w\/-_]{1,64})(\/(?P<file>[\w\/-_]*?[.]{1}[\w\/-_]*?))?$/", $uri, $matches);
    $ret['path'] = (isset($matches['path'])) ? $matches['path'] : null;
    $ret['file'] = (isset($matches['file'])) ? $matches['file'] : null;
    $ret['path'] = (preg_match("/.*\/$/", $ret['path'])) ? $ret['path'] : $ret['path'] . "/";
    return $ret;
}

$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';
$uri4 = '/home?k=8';

print_r( uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
print_r( uri_decode($uri4) );  // outputs array( [path] => /home/ [file]=> )
?>
0
 
LVL 6

Expert Comment

by:ixti
ID: 16612518
And another example. Without using of RegExps:

<?php
function noregexp_uri_decode($uri)
{
    $ret = array('path' => null, 'file' => null);
    $uri = explode("?", $uri);
    $uri = explode("#", $uri[0]);
    $uri = $uri[0];
    if ($pos = strrpos($uri, ".")) {
        $pos            = strrpos($uri, "/");
        $ret['file']    = substr($uri, $pos + 1);
        $ret['path']    = substr($uri, 0, $pos + 1);
    } else {
        $ret['path']    = $uri;
    }
    $ret['path'] = "/" . trim($ret['path'], "/") . "/";
    return $ret;
}

$uri1 = '/home';
$uri2 = '/home/contactus/';
$uri3 = '/products/widgets/find.php?id=1034';
$uri4 = '/home?k=8';

print_r( noregexp_uri_decode($uri1) );  // outputs array( [path] => /home/ [file]=> )
print_r( noregexp_uri_decode($uri2) );  // outputs array( [path] => /home/contactus/ [file]=> )
print_r( noregexp_uri_decode($uri3) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
print_r( noregexp_uri_decode($uri4) );  // outputs array( [path] => /products/widgets/ [file]=> find.php )
?>
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Suggested Courses
Course of the Month13 days, 20 hours left to enroll

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question