Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 380
  • Last Modified:

Regex for a url with variations - preg_replace

I need to use regex (for a preg_replace statement) to match the following example:
product.php?productid=2930&cat=252&page=1

I need productid. cat & page do not always exist in the url.

A replacement url would be something like (for SEO purposes):
product/{productid}.html


Thanks!
0
14100
Asked:
14100
  • 9
  • 8
  • 6
  • +1
2 Solutions
 
BogoJokerCommented:
Hi 14100,

How is this sample code, it has the information in $str, and if this is actually the query string at the top of the page you could always just use $_GET['productid'] to get the product id.

<?php
$str = 'product.php?productid=2930&cat=252&page=1';
preg_match('/product.php\?(productid=\d+)?/', $str, $matches);
print_r($matches);
$split = explode('=', $matches[1]);
$newUrl = "product/$split[1].html";
print "<br>";
print "Therefore the product id is --> $split[1] <br>";
print "The new URL is: $newUrl";
?>

Any question/comments, just ask!
Joe P
0
 
BogoJokerCommented:
Here is a slight improvment, on the above, instead of splitting, thanks to Roonaan I learned how to just put it right into preg_match so here is it without having to explode and use $split, instead the number 2930 will just be in $matches[2]:

<?php
$str = 'product.php?productid=2930&cat=252&page<wbr/>=1';
preg_match('/product.php\?(productid=(\d<wbr/>+))?/', $str, $matches);
print_r($matches);
$newUrl = "product/$matches[2].html";
print "<br>";
print "Therefore the product id is --> $matches[2] <br>";
print "The new URL is: $newUrl";
?>

Joe P

Joe P
0
 
star_trekCommented:
slight modification to Bojo
the following works even if the product id is in between or to the end of the string.
$str = 'product.php?cat=252&productid=2930&page=1';
preg_match("/product.php(.*)[\?&]productid=(\d+)/",$p,$match);
$newUrl = "product/$match[2].html";
print "<br>";
print "Therefore the product id is --> $match[2] <br>";
print "The new URL is: $newUrl";
?>
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
14100Author Commented:
Thanks Joe, that's not bad. One note, you have to escape the . in product.php, otherwise it can be interpreted as a wildcard. Also your code in the 2nd example has <wbr/>, not sure what that is, removed it and the example worked.

I'm increasing the points to 500 from 250. Do you think you can alter that a bit further to grab an unknown number of other arguments in the url?
i.e.-
product.php?productid=2930&cat=252&page=1&psort=d

I want to be able to grab the additional arguments for parsing, but I don't know what all additional arguments exist (yet). So I need a generic addition to the regex that can catch an unlimited number of unknown additional arguments.

Is that possible?  (btw Joe, whoever answers, you'll get at least 250 pts, thanks!)
0
 
KennyTMCommented:
Hi.

In fact it can be done just using built-in functions:

$str = 'http://www.host.com/product.php?productid=2930&cat=252&page=1';
$turl = parse_url($str);
parse_str($turl['query'], $tturl);
$url = "product/$tturl[productid].html";
echo $url;
0
 
14100Author Commented:
star_trek, your change appears to be a bit too greedy, it's catching too much into $match[0] (although $match[2] pulls back the correct answer, don't want to match too much and encounter errors)
0
 
KennyTMCommented:
(in fact the code works even $str is only

$str = 'product.php?productid=2930&cat=252&page=1';

. Just to clarify.)
0
 
14100Author Commented:
To further describe what's going on, this is html source that is being parsed, so the URL needs to be matched throughout the html. While Kenny's solution seems nice, it's only partial, since the url itself still needs to be matched in the html source
0
 
KennyTMCommented:
so you mean you want to replace all occurence of "product.php?something=someval&productid=xxx&foo=bar&cat=252&page=1&etc=etc"  with "product/xxx.html" ?
0
 
KennyTMCommented:
within the document?
0
 
14100Author Commented:
Correct. And regarding the additional as-yet-unknown arguments in the url, they might have to be added into the New url at a later date, so I want the url parsing mechanism or regex statement to grab those additional arguments, without the arguments needing to be defined yet.

Explaining a different way, I want to be able to generically parse the entire detected url, nothing more, nothing less, and then be able to reference the different arguments, just by referring to them in an array. (similar to the parse_str method that you showed)

There are multiple target urls found in the source code, not all are the exact same, and all will need to be detected/parsed
0
 
KennyTMCommented:
Maybe try this:

function resolve_url ($s) {
  $turl = parse_url($s);
  parse_str($turl['query'], $tturl);
  return "product/$tturl[productid].html";
}

$str = 'blah blah blah <a href="product.php?productid=2930&cat=252&page=1"> blah</a> blah <b onclick="window.open(\'product.php?cat=1244&productid=31415926535&page=8\')">blah</b> blah blah <form action="product.php?hello=world&this=is&a=testing&page=5&productid=1&cat=1"></form>';

$str2 = preg_replace("|product\.php\?[\w%+_&=!$&()*,;]+|eis", 'resolve_url("\0")', $str);

echo $str2;
0
 
KennyTMCommented:
and if additional queries are needed to be passed, you can replace the function resolve_url() with

function resolve_url ($s) {
  $turl = parse_url($s);
  parse_str($turl['query'], $tturl);
  return "product/$tturl[productid].html?$turl[query]";
}
0
 
star_trekCommented:
14100
my suggestion above would exactly work what you are looking for(this is the same code as above)
<?
$str = 'product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc';
preg_match("/product.php(.*)[\?&]productid=(\d+)/",$str,$match);
$newUrl = "product/$match[2].html";
print "<br>";
print "Therefore the product id is --> $match[2] <br>";
print "The new URL is: $newUrl";
?>
0
 
star_trekCommented:
if you are worried about too much matching, you can aslo try
preg_match("/[\?&]productid=(\d+)/",$str,$match);
0
 
14100Author Commented:
star_trek, as I mentioned, your code is too greedy, it matches too much. here's an example of the $match array after being run using your code:

Array
(
    [0] => product.php?productid=2930&cat=252&page=1"><IMG id="" src="/xshop/image.php?productid=2930
    [1] => ?productid=2930&cat=252&page=1"><IMG id="" src="/xshop/image.php
    [2] => 2930
)


It has got to find *just* the url, because the url is going to be replaced. As you can see, it extended into the following IMG code from the source.
0
 
star_trekCommented:
14100
use instead the following
<?
$str = 'product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc';
preg_match("/[\?&]productid=(\d+)/",$str,$match);
$newUrl = "product/$match[1].html";
print "<br>";
print "Therefore the product id is --> $match[1] <br>";
print "The new URL is: $newUrl";
?>
0
 
14100Author Commented:
this code has to in-line replace the original url, which it can't do because you're not grabbing the full url. the modified regex that you posted now only grabs the ?productid=1234, so there's no way to replace the url that it was pulled from.

Kenny, i'll try your suggestion in a bit
0
 
star_trekCommented:
14100, i understand whet you are saying, you can try this
preg_match("/product.php(.*)[\?&]productid=(\d+)[^>]/",$str,$match);
0
 
14100Author Commented:
Kenny, your solution isn't working. Star_trek, again it's pulling back too much information
0
 
star_trekCommented:
14100, try this one, this one shouldn't pull alot of info
preg_match("/product.php.*[\?&]productid=(\d+)[^>]/",$str,$match);
$match[1] is the id.

0
 
14100Author Commented:
Still pulling back too much.

What i need is to pull back
product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc
exactly as-is, have the arguments parsed out, and then replace
product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc
with
product/{productid}.html

The code you keep giving me is not stopping at the end of the url, but is going on to the next url (usually in an IMG tag), grabbing ALL of that text, which means it'll be next to impossible to do a straight-replace
0
 
star_trekCommented:
Try this , this should work
preg_match("/product.php.*[\?&]productid=(\d+).*>/U",$str,$match);
0
 
star_trekCommented:
slight correction, since using U, i need to have ?, otherwise it on;y pulls up one digit.
preg_match("/product.php.*[\?&]productid=(\d+?).*>/U",$str,$match);
0
 
star_trekCommented:
14100, sorry have to go back and forth to understand what you want clearly, here it is

preg_match("/(product.php.*[\?&]productid=(\d+?).*)>/U",$str,$match);

$match[1] would produce like
product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc

$match[2]  produces 6789(product id)
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 9
  • 8
  • 6
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now