PHP - Regex for preg_replace links... Tricky

Hi all,

Could someone please provide Regex Code that will let me "preg_replace" outbound links like:

href="http://www.RandomLink.co.uk/a/1/n2z.shtml"
href="http://RandomLink.com/hit/59/?ref=Page+%2291"
href="www.RandomLink.net/go.php"

(to)

href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"

** but Excluding all other links to: .mpg, .wmv, .css, etc...


Below is a visual example:

--------------------------
 FROM THIS (before)
--------------------------
<html>
 <head>
  <title>Example</title>
  <link rel="stylesheet" type="text/css" href="./css/style.css">
 </head>
  <body>
    <img src="/images/random_gif_01.gif"><br>
      <br>
        <a href="http://www.RandomLink.co.uk/a/1/n2z.shtml" class="s1">Random Link</a>
      <br>
    <p><b><a href="http://RandomLink.com/hit/59/?ref=Page+%2291">Click Here</a></b> now!</p>
      <br>
    <div id="center">
        <a href="random-movie_z1_01.wmv" class="z1"><img src="../../img2/random_b.jpg"></a>
        <a href="random-movie_02_az.mpg" class="az"><img src="image3.jpg"></a>
    </div>
    <div id="abc123">
         <a href="./folder3/3rd_random-movie8.wmv"><img src="image4.jpg"></a>
        <a href="../folder4/4th_random-movie9.mpg"><img src="image5.jpg"></a>
    </div>
    <div id="footer">Random text...<br>
            <a href="www.RandomLink.net/go.php">Click Here!</a>
    </div>
  </body>
</html>

--------------------------
 TO THIS (after)
--------------------------
<html>
 <head>
  <title>Example</title>
  <link rel="stylesheet" type="text/css" href="./css/style.css">
 </head>
  <body>
    <img src="/images/random_gif_01.gif"><br>
      <br>
        <a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com" class="s1">Random Link</a>
      <br>
    <p><b><a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here</a></b> now!</p>
      <br>
    <div id="center">
        <a href="random-movie_z1_01.wmv" class="z1"><img src="../../img2/random_b.jpg"></a>
        <a href="random-movie_02_az.mpg" class="az"><img src="image3.jpg"></a>
    </div>
    <div id="abc123">
         <a href="./folder3/3rd_random-movie8.wmv"><img src="image4.jpg"></a>
        <a href="../folder4/4th_random-movie9.mpg"><img src="image5.jpg"></a>
    </div>
    <div id="footer">Random text...<br>
            <a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here!</a>
    </div>
  </body>
</html>
--------------------------

Thank you,
D-
jax0Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

karoldvlCommented:
Are all your outgoing links preceded by http:// or www.?

What other file extensions do you want excluded?
jax0Author Commented:
hmm,

I guess it would really depend on how the author of the webpage writes it.

You see this Regex will be for a webpage scraper I'm making.. you'll notice that in my example, I tried to "mix" up different tags, paths, etc... I did this to try to cover any possible situation my scraper would come across from any webpage I use it on..

but if its not possible to create a Regex pattern compatible with both  http:// (and)  www.

then I would probably choose:  http://

thx again
karoldvlCommented:
The most general form I could come up at the moment is here.

Should replace all HREFs which begin with http:// or www. and which do not end in .mpg, .wmv or .css. Please see if this is what you look for.


<?php

$input = '<html>
 <head>
  <title>Example</title>
  <link rel="stylesheet" type="text/css" href="./css/style.css">
 </head>
  <body>
    <img src="/images/random_gif_01.gif"><br>
      <br>
        <a href="http://www.RandomLink.co.uk/a/1/n2z.shtml" class="s1">Random Link</a>
      <br>
    <p><b><a href="http://RandomLink.com/hit/59/?ref=Page+%2291">Click Here</a></b> now!</p>
      <br>
    <div id="center">
        <a href="random-movie_z1_01.wmv" class="z1"><img src="../../img2/random_b.jpg"></a>
        <a href="random-movie_02_az.mpg" class="az"><img src="image3.jpg"></a>
    </div>
    <div id="abc123">
         <a href="./folder3/3rd_random-movie8.wmv"><img src="image4.jpg"></a>
        <a href="../folder4/4th_random-movie9.mpg"><img src="image5.jpg"></a>
    </div>
    <div id="footer">Random text...<br>
            <a href="www.RandomLink.net/go.php">Click Here!</a>
    </div>
  </body>
</html>';

$prefixInclude = '(http:\/\/|www\.)';
$suffixExclude = '\.mpg|\.wmv|\.css';
$destinationURL = 'http://www.example.com';
$replaced = preg_replace('/href="'.$prefixInclude.'([^"]*)(?<!'.$suffixExclude.')"/', 'href="'.$destinationURL.'"', $input);

echo $replaced;

?>

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jax0Author Commented:
wow - thank you!  Yes it works great...
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.