jax0
asked on
PHP - Regex for preg_replace links... Tricky
Hi all,
Could someone please provide Regex Code that will let me "preg_replace" outbound links like:
href="http://www.RandomLink.co.uk/a/1/n2z.shtml"
href="http://RandomLink.com/hit/59/?ref=Page+%2291"
href="www.RandomLink.net/go.php"
(to)
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
** but Excluding all other links to: .mpg, .wmv, .css, etc...
Below is a visual example:
--------------------------
FROM THIS (before)
--------------------------
<html>
<head>
<title>Example</title>
<link rel="stylesheet" type="text/css" href="./css/style.css">
</head>
<body>
<img src="/images/random_gif_01 .gif"><br>
<br>
<a href="http://www.RandomLink.co.uk/a/1/n2z.shtml" class="s1">Random Link</a>
<br>
<p><b><a href="http://RandomLink.com/hit/59/?ref=Page+%2291">Click Here</a></b> now!</p>
<br>
<div id="center">
<a href="random-movie_z1_01.w mv" class="z1"><img src="../../img2/random_b.j pg"></a>
<a href="random-movie_02_az.m pg" class="az"><img src="image3.jpg"></a>
</div>
<div id="abc123">
<a href="./folder3/3rd_random -movie8.wm v"><img src="image4.jpg"></a>
<a href="../folder4/4th_rando m-movie9.m pg"><img src="image5.jpg"></a>
</div>
<div id="footer">Random text...<br>
<a href="www.RandomLink.net/go.php">Click Here!</a>
</div>
</body>
</html>
--------------------------
TO THIS (after)
--------------------------
<html>
<head>
<title>Example</title>
<link rel="stylesheet" type="text/css" href="./css/style.css">
</head>
<body>
<img src="/images/random_gif_01 .gif"><br>
<br>
<a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com" class="s1">Random Link</a>
<br>
<p><b><a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here</a></b> now!</p>
<br>
<div id="center">
<a href="random-movie_z1_01.w mv" class="z1"><img src="../../img2/random_b.j pg"></a>
<a href="random-movie_02_az.m pg" class="az"><img src="image3.jpg"></a>
</div>
<div id="abc123">
<a href="./folder3/3rd_random -movie8.wm v"><img src="image4.jpg"></a>
<a href="../folder4/4th_rando m-movie9.m pg"><img src="image5.jpg"></a>
</div>
<div id="footer">Random text...<br>
<a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here!</a>
</div>
</body>
</html>
--------------------------
Thank you,
D-
Could someone please provide Regex Code that will let me "preg_replace" outbound links like:
href="http://www.RandomLink.co.uk/a/1/n2z.shtml"
href="http://RandomLink.com/hit/59/?ref=Page+%2291"
href="www.RandomLink.net/go.php"
(to)
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com"
** but Excluding all other links to: .mpg, .wmv, .css, etc...
Below is a visual example:
--------------------------
FROM THIS (before)
--------------------------
<html>
<head>
<title>Example</title>
<link rel="stylesheet" type="text/css" href="./css/style.css">
</head>
<body>
<img src="/images/random_gif_01
<br>
<a href="http://www.RandomLink.co.uk/a/1/n2z.shtml" class="s1">Random Link</a>
<br>
<p><b><a href="http://RandomLink.com/hit/59/?ref=Page+%2291">Click Here</a></b> now!</p>
<br>
<div id="center">
<a href="random-movie_z1_01.w
<a href="random-movie_02_az.m
</div>
<div id="abc123">
<a href="./folder3/3rd_random
<a href="../folder4/4th_rando
</div>
<div id="footer">Random text...<br>
<a href="www.RandomLink.net/go.php">Click Here!</a>
</div>
</body>
</html>
--------------------------
TO THIS (after)
--------------------------
<html>
<head>
<title>Example</title>
<link rel="stylesheet" type="text/css" href="./css/style.css">
</head>
<body>
<img src="/images/random_gif_01
<br>
<a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com" class="s1">Random Link</a>
<br>
<p><b><a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here</a></b> now!</p>
<br>
<div id="center">
<a href="random-movie_z1_01.w
<a href="random-movie_02_az.m
</div>
<div id="abc123">
<a href="./folder3/3rd_random
<a href="../folder4/4th_rando
</div>
<div id="footer">Random text...<br>
<a href="http://www.ZZZZZZZZZZZZZZZZZZZZZZ.com">Click Here!</a>
</div>
</body>
</html>
--------------------------
Thank you,
D-
ASKER
hmm,
I guess it would really depend on how the author of the webpage writes it.
You see this Regex will be for a webpage scraper I'm making.. you'll notice that in my example, I tried to "mix" up different tags, paths, etc... I did this to try to cover any possible situation my scraper would come across from any webpage I use it on..
but if its not possible to create a Regex pattern compatible with both http:// (and) www.
then I would probably choose: http://
thx again
I guess it would really depend on how the author of the webpage writes it.
You see this Regex will be for a webpage scraper I'm making.. you'll notice that in my example, I tried to "mix" up different tags, paths, etc... I did this to try to cover any possible situation my scraper would come across from any webpage I use it on..
but if its not possible to create a Regex pattern compatible with both http:// (and) www.
then I would probably choose: http://
thx again
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
wow - thank you! Yes it works great...
What other file extensions do you want excluded?