[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 349
  • Last Modified:

Regular expresions to detect and remove links

So I want to do some in php, I get a text from wordpress that contains certain links, so what I want to do is first get all the files matching the following

http://www.anydomain.tdl/wp-content/uploads/somefolder/somefolder/filename.ext

And get a list of all the files on that text matching certain extensions. So basically the stuff in bold will repeat in every link I need to match, but the other thing will change, and I need a place to define the list of extensions for ext.

After that I need to remove from the text the matching link, meaninng:

<a href="matched_link" something> something</a>

Can some body help me with this.
0
brauliomendez
Asked:
brauliomendez
  • 3
2 Solutions
 
Terry WoodsIT GuruCommented:
Ok, something like this should match your links:

preg_match_all('#http://www\.(?:[a-z\d\-]*[a-z\d]\.)+[a-z]{2,}/wp-content/uploads/[^/"\'\s?&]+/[^/"\'\s?&]+/[^/"\'\s?&]+\.(?:jpg|gif)#',$sourcestring,$matches);
print_r($matches);

Open in new window


I'm a little unclear about what you want done for the big picture though. If you had text:

blah blah <a href="matched_link" something> something</a> blah

Did you want your result to be:
blah blah blah

?
0
 
Terry WoodsIT GuruCommented:
If I understood your requirement correctly, this should do the trick:
preg_replace('#<a\s[^>]*?href\s*?=\s*?(\'|")http://www\.(?:[a-z\d\-]*[a-z\d]\.)+[a-z]{2,}/wp-content/uploads/[^/"\'\s?&]+/[^/"\'\s?&]+/[^/"\'\s?&]+\.(?:jpg|gif)\1[^>]*>.*?</a>#is','',$sourcestring);

Open in new window

0
 
Terry WoodsIT GuruCommented:
You should be able to see how I've allowed for multiple extensions by example. In my case I allowed jpg and gif. You can expand the list like this if you have an array of extensions:
$extensions = array("gif","jpg","etc");
$extensionPattern = implode("|",$extensions);
preg_replace('#<a\s[^>]*?href\s*?=\s*?(\'|")http://www\.(?:[a-z\d\-]*[a-z\d]\.)+[a-z]{2,}/wp-content/uploads/[^/"\'\s?&]+/[^/"\'\s?&]+/[^/"\'\s?&]+\.(?:'.$extensionPattern.')\1[^>]*>.*?</a>#is','',$sourcestring);

Open in new window

0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now