Regex for a url with variations - preg_replace

I need to use regex (for a preg_replace statement) to match the following example:
product.php?productid=2930&cat=252&page=1

I need productid. cat & page do not always exist in the url.

A replacement url would be something like (for SEO purposes):
product/{productid}.html


Thanks!
14100Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

BogoJokerCommented:
Hi 14100,

How is this sample code, it has the information in $str, and if this is actually the query string at the top of the page you could always just use $_GET['productid'] to get the product id.

<?php
$str = 'product.php?productid=2930&cat=252&page=1';
preg_match('/product.php\?(productid=\d+)?/', $str, $matches);
print_r($matches);
$split = explode('=', $matches[1]);
$newUrl = "product/$split[1].html";
print "<br>";
print "Therefore the product id is --> $split[1] <br>";
print "The new URL is: $newUrl";
?>

Any question/comments, just ask!
Joe P
0
BogoJokerCommented:
Here is a slight improvment, on the above, instead of splitting, thanks to Roonaan I learned how to just put it right into preg_match so here is it without having to explode and use $split, instead the number 2930 will just be in $matches[2]:

<?php
$str = 'product.php?productid=2930&cat=252&page<wbr/>=1';
preg_match('/product.php\?(productid=(\d<wbr/>+))?/', $str, $matches);
print_r($matches);
$newUrl = "product/$matches[2].html";
print "<br>";
print "Therefore the product id is --> $matches[2] <br>";
print "The new URL is: $newUrl";
?>

Joe P

Joe P
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
star_trekCommented:
slight modification to Bojo
the following works even if the product id is in between or to the end of the string.
$str = 'product.php?cat=252&productid=2930&page=1';
preg_match("/product.php(.*)[\?&]productid=(\d+)/",$p,$match);
$newUrl = "product/$match[2].html";
print "<br>";
print "Therefore the product id is --> $match[2] <br>";
print "The new URL is: $newUrl";
?>
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

14100Author Commented:
Thanks Joe, that's not bad. One note, you have to escape the . in product.php, otherwise it can be interpreted as a wildcard. Also your code in the 2nd example has <wbr/>, not sure what that is, removed it and the example worked.

I'm increasing the points to 500 from 250. Do you think you can alter that a bit further to grab an unknown number of other arguments in the url?
i.e.-
product.php?productid=2930&cat=252&page=1&psort=d

I want to be able to grab the additional arguments for parsing, but I don't know what all additional arguments exist (yet). So I need a generic addition to the regex that can catch an unlimited number of unknown additional arguments.

Is that possible?  (btw Joe, whoever answers, you'll get at least 250 pts, thanks!)
0
KennyTMCommented:
Hi.

In fact it can be done just using built-in functions:

$str = 'http://www.host.com/product.php?productid=2930&cat=252&page=1';
$turl = parse_url($str);
parse_str($turl['query'], $tturl);
$url = "product/$tturl[productid].html";
echo $url;
0
14100Author Commented:
star_trek, your change appears to be a bit too greedy, it's catching too much into $match[0] (although $match[2] pulls back the correct answer, don't want to match too much and encounter errors)
0
KennyTMCommented:
(in fact the code works even $str is only

$str = 'product.php?productid=2930&cat=252&page=1';

. Just to clarify.)
0
14100Author Commented:
To further describe what's going on, this is html source that is being parsed, so the URL needs to be matched throughout the html. While Kenny's solution seems nice, it's only partial, since the url itself still needs to be matched in the html source
0
KennyTMCommented:
so you mean you want to replace all occurence of "product.php?something=someval&productid=xxx&foo=bar&cat=252&page=1&etc=etc"  with "product/xxx.html" ?
0
KennyTMCommented:
within the document?
0
14100Author Commented:
Correct. And regarding the additional as-yet-unknown arguments in the url, they might have to be added into the New url at a later date, so I want the url parsing mechanism or regex statement to grab those additional arguments, without the arguments needing to be defined yet.

Explaining a different way, I want to be able to generically parse the entire detected url, nothing more, nothing less, and then be able to reference the different arguments, just by referring to them in an array. (similar to the parse_str method that you showed)

There are multiple target urls found in the source code, not all are the exact same, and all will need to be detected/parsed
0
KennyTMCommented:
Maybe try this:

function resolve_url ($s) {
  $turl = parse_url($s);
  parse_str($turl['query'], $tturl);
  return "product/$tturl[productid].html";
}

$str = 'blah blah blah <a href="product.php?productid=2930&cat=252&page=1"> blah</a> blah <b onclick="window.open(\'product.php?cat=1244&productid=31415926535&page=8\')">blah</b> blah blah <form action="product.php?hello=world&this=is&a=testing&page=5&productid=1&cat=1"></form>';

$str2 = preg_replace("|product\.php\?[\w%+_&=!$&()*,;]+|eis", 'resolve_url("\0")', $str);

echo $str2;
0
KennyTMCommented:
and if additional queries are needed to be passed, you can replace the function resolve_url() with

function resolve_url ($s) {
  $turl = parse_url($s);
  parse_str($turl['query'], $tturl);
  return "product/$tturl[productid].html?$turl[query]";
}
0
star_trekCommented:
14100
my suggestion above would exactly work what you are looking for(this is the same code as above)
<?
$str = 'product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc';
preg_match("/product.php(.*)[\?&]productid=(\d+)/",$str,$match);
$newUrl = "product/$match[2].html";
print "<br>";
print "Therefore the product id is --> $match[2] <br>";
print "The new URL is: $newUrl";
?>
0
star_trekCommented:
if you are worried about too much matching, you can aslo try
preg_match("/[\?&]productid=(\d+)/",$str,$match);
0
14100Author Commented:
star_trek, as I mentioned, your code is too greedy, it matches too much. here's an example of the $match array after being run using your code:

Array
(
    [0] => product.php?productid=2930&cat=252&page=1"><IMG id="" src="/xshop/image.php?productid=2930
    [1] => ?productid=2930&cat=252&page=1"><IMG id="" src="/xshop/image.php
    [2] => 2930
)


It has got to find *just* the url, because the url is going to be replaced. As you can see, it extended into the following IMG code from the source.
0
star_trekCommented:
14100
use instead the following
<?
$str = 'product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc';
preg_match("/[\?&]productid=(\d+)/",$str,$match);
$newUrl = "product/$match[1].html";
print "<br>";
print "Therefore the product id is --> $match[1] <br>";
print "The new URL is: $newUrl";
?>
0
14100Author Commented:
this code has to in-line replace the original url, which it can't do because you're not grabbing the full url. the modified regex that you posted now only grabs the ?productid=1234, so there's no way to replace the url that it was pulled from.

Kenny, i'll try your suggestion in a bit
0
star_trekCommented:
14100, i understand whet you are saying, you can try this
preg_match("/product.php(.*)[\?&]productid=(\d+)[^>]/",$str,$match);
0
14100Author Commented:
Kenny, your solution isn't working. Star_trek, again it's pulling back too much information
0
star_trekCommented:
14100, try this one, this one shouldn't pull alot of info
preg_match("/product.php.*[\?&]productid=(\d+)[^>]/",$str,$match);
$match[1] is the id.

0
14100Author Commented:
Still pulling back too much.

What i need is to pull back
product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc
exactly as-is, have the arguments parsed out, and then replace
product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc
with
product/{productid}.html

The code you keep giving me is not stopping at the end of the url, but is going on to the next url (usually in an IMG tag), grabbing ALL of that text, which means it'll be next to impossible to do a straight-replace
0
star_trekCommented:
Try this , this should work
preg_match("/product.php.*[\?&]productid=(\d+).*>/U",$str,$match);
0
star_trekCommented:
slight correction, since using U, i need to have ?, otherwise it on;y pulls up one digit.
preg_match("/product.php.*[\?&]productid=(\d+?).*>/U",$str,$match);
0
star_trekCommented:
14100, sorry have to go back and forth to understand what you want clearly, here it is

preg_match("/(product.php.*[\?&]productid=(\d+?).*)>/U",$str,$match);

$match[1] would produce like
product.php?something=someval&productid=6789&foo=bar&cat=252&page=1&etc=etc

$match[2]  produces 6789(product id)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.