Link to home
Start Free TrialLog in
Avatar of jax0
jax0

asked on

PHP - Regex for preg_replace links... Tricky

Hi all,

Could someone please provide Regex Code that will let me "preg_replace" outbound links like:

href="http://www.RandomLink.co.uk/a/1/n2z.shtml"
href="http://RandomLink.com/hit/59/?ref=Page+%2291"
href="www.RandomLink.net/go.php"

(to)

href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"

** but Excluding all other links to: .mpg, .wmv, .css, etc...


Below is a visual example:

--------------------------
 FROM THIS (before)
--------------------------
<html>
 <head>
  <title>Example</title>
  <link rel="stylesheet" type="text/css" href="./css/style.css">
 </head>
  <body>
    <img src="/images/random_gif_01.gif"><br>
      <br>
        <a href="http://www.RandomLink.co.uk/a/1/n2z.shtml" class="s1">Random Link</a>
      <br>
    <p><b><a href="http://RandomLink.com/hit/59/?ref=Page+%2291">Click Here</a></b> now!</p>
      <br>
    <div id="center">
        <a href="random-movie_z1_01.wmv" class="z1"><img src="../../img2/random_b.jpg"></a>
        <a href="random-movie_02_az.mpg" class="az"><img src="image3.jpg"></a>
    </div>
    <div id="abc123">
         <a href="./folder3/3rd_random-movie8.wmv"><img src="image4.jpg"></a>
        <a href="../folder4/4th_random-movie9.mpg"><img src="image5.jpg"></a>
    </div>
    <div id="footer">Random text...<br>
            <a href="www.RandomLink.net/go.php">Click Here!</a>
    </div>
  </body>
</html>

--------------------------
 TO THIS (after)
--------------------------
<html>
 <head>
  <title>Example</title>
  <link rel="stylesheet" type="text/css" href="./css/style.css">
 </head>
  <body>
    <img src="/images/random_gif_01.gif"><br>
      <br>
        <a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com" class="s1">Random Link</a>
      <br>
    <p><b><a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here</a></b> now!</p>
      <br>
    <div id="center">
        <a href="random-movie_z1_01.wmv" class="z1"><img src="../../img2/random_b.jpg"></a>
        <a href="random-movie_02_az.mpg" class="az"><img src="image3.jpg"></a>
    </div>
    <div id="abc123">
         <a href="./folder3/3rd_random-movie8.wmv"><img src="image4.jpg"></a>
        <a href="../folder4/4th_random-movie9.mpg"><img src="image5.jpg"></a>
    </div>
    <div id="footer">Random text...<br>
            <a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here!</a>
    </div>
  </body>
</html>
--------------------------

Thank you,
D-
Avatar of karoldvl
karoldvl
Flag of Poland image

Are all your outgoing links preceded by http:// or www.?

What other file extensions do you want excluded?
Avatar of jax0
jax0

ASKER

hmm,

I guess it would really depend on how the author of the webpage writes it.

You see this Regex will be for a webpage scraper I'm making.. you'll notice that in my example, I tried to "mix" up different tags, paths, etc... I did this to try to cover any possible situation my scraper would come across from any webpage I use it on..

but if its not possible to create a Regex pattern compatible with both  http:// (and)  www.

then I would probably choose:  http://

thx again
ASKER CERTIFIED SOLUTION
Avatar of karoldvl
karoldvl
Flag of Poland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jax0

ASKER

wow - thank you!  Yes it works great...