Solved

Regular expresions to detect and remove links

Posted on 2012-03-29
3
334 Views
Last Modified: 2012-03-29
So I want to do some in php, I get a text from wordpress that contains certain links, so what I want to do is first get all the files matching the following

http://www.anydomain.tdl/wp-content/uploads/somefolder/somefolder/filename.ext

And get a list of all the files on that text matching certain extensions. So basically the stuff in bold will repeat in every link I need to match, but the other thing will change, and I need a place to define the list of extensions for ext.

After that I need to remove from the text the matching link, meaninng:

<a href="matched_link" something> something</a>

Can some body help me with this.
0
Comment
Question by:brauliomendez
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
3 Comments
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 37785214
Ok, something like this should match your links:

preg_match_all('#http://www\.(?:[a-z\d\-]*[a-z\d]\.)+[a-z]{2,}/wp-content/uploads/[^/"\'\s?&]+/[^/"\'\s?&]+/[^/"\'\s?&]+\.(?:jpg|gif)#',$sourcestring,$matches);
print_r($matches);

Open in new window


I'm a little unclear about what you want done for the big picture though. If you had text:

blah blah <a href="matched_link" something> something</a> blah

Did you want your result to be:
blah blah blah

?
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37785228
If I understood your requirement correctly, this should do the trick:
preg_replace('#<a\s[^>]*?href\s*?=\s*?(\'|")http://www\.(?:[a-z\d\-]*[a-z\d]\.)+[a-z]{2,}/wp-content/uploads/[^/"\'\s?&]+/[^/"\'\s?&]+/[^/"\'\s?&]+\.(?:jpg|gif)\1[^>]*>.*?</a>#is','',$sourcestring);

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 37785230
You should be able to see how I've allowed for multiple extensions by example. In my case I allowed jpg and gif. You can expand the list like this if you have an array of extensions:
$extensions = array("gif","jpg","etc");
$extensionPattern = implode("|",$extensions);
preg_replace('#<a\s[^>]*?href\s*?=\s*?(\'|")http://www\.(?:[a-z\d\-]*[a-z\d]\.)+[a-z]{2,}/wp-content/uploads/[^/"\'\s?&]+/[^/"\'\s?&]+/[^/"\'\s?&]+\.(?:'.$extensionPattern.')\1[^>]*>.*?</a>#is','',$sourcestring);

Open in new window

0

Featured Post

Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
This article discusses four methods for overlaying images in a container on a web page
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question