Solved

regex for preg_replace to delete whatever html tags contain .youtube. in src

Posted on 2011-09-29
9
336 Views
Last Modified: 2012-05-12
regex for preg_replace to delete whatever html tags contain .youtube. (includes the 2 dots) in src.
No matter it is iframe, embed or object.
For <object> tag, the src may even in a <param> tag
Also, the tag could be separated into multiple lines un-ideally with tabs mixed. e.g.
<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>

Thank you very much for all the help!
0
Comment
Question by:candychan611
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36814324
What is the output you would expect after processing this string?

<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36814330
It would also be helpful to us if you would post examples of the before-and-after conditions.  Show us what to expect inside a string that looks like what you will be working with.  Given some good test data and the desired output(s) we can probably be very helpful, too!
0
 

Author Comment

by:candychan611
ID: 36814594
Hi Ray,

I would like to complete delete the whole tag.
0
Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

 

Author Comment

by:candychan611
ID: 36814637
It is used in Wordpress so the tags should be already balanced. Since the user can edit it freely in code mode when composing a post, I expected they could enter the code upper and lower case mixed / using double/single or even no quotes for the attributes.

However, I don't what other attributes are there. I just see it as a case insensitive string ".youtube." exist between < and >

Hope this helps.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 36817974
This seems to work for me:
$sourcestring = preg_replace('#<\s*([^>\s]+)[^>]*\ssrc\s*=\s*"[^"]*\.youtube\.[^"]*"[^>]*>.*?</\1>#is','',$sourcestring);

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36817981
Note it won't work though when the src is in a separate tag, such as what you describe with an object tag. Can you provide an example of how that might look? We might need to pick those cases up separately.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36818437
Yeah, it all comes back to the quality of the test data.

It would also be helpful to us if you would post examples of the before-and-after conditions.

Make these as close as possible to the real thing - a collection of cut-and-paste strings from the original HTML would be best.  We might be able to do something with REGEX, but in my experience using REGEX to parse HTML can quickly become a fools errand.  Even well-formed HTML sometimes requires a state engine.
0
 

Author Comment

by:candychan611
ID: 36896374
Hi Terry and Ray,

I've finally figured out that the formatting and balance procedure of wordpress will result in double quoted attributes only. Terry is correct to filter only double quotes (src\s*=\s*"[^"]*\.youtube\.[^"]*")!!

It also limited the use of old style embed format such as,
<object width="320" height="264">
  <param name="movie" value="http://www.youtube.com/v/(id)"></param>
  <param name="wmode" value="transparent"></param>
  <embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed>
</object>

Open in new window


as it will produce malformatted code after the formatting and balance procedure.
So, I used code like,

<embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" height="350" width="425"></embed>

<iframe src="http://www.youtube.com/embed/0(id)" frameborder="0" height="360" width="480"></iframe>

Open in new window


Terry's solution works perfectly!!

Finally, I've extended the regex a bit to become
$pre_regex = '#<\s*([^>\s]+)[^>]*\s(data|movie|src|value)\s*=\s*"[^"]*\.(youtube|tudou)\.[^"]*"[^>]*>.*?</\1>#is';

Open in new window


 to use on the object tag and to filter other sites. Such as,

<object type="application/x-shockwave-flash" style="width: 480px; height: 360px" data="http://www.youtube.com/v/(id)"></object>

<embed src="http://www.TUDOU.com/v/(id)/v.swf" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="opaque" height="400" width="480"></embed>

Open in new window


All works!!
Thank you very much for your time :)
You saved my day!
0
 

Author Comment

by:candychan611
ID: 36896382
A finally note to others,

As Wordpress limited the use of old style embed code, the case that src/data is in a separate tag does not need to be considered.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I'll explain how to setup a Plex Media Server (https://plex.tv/) on a Redhat (Centos) 7 based NAS with screenshots to help those looking for assistance.  What is Plex? If you aren't familiar with Plex, it’s a DLNA media serv…
The advancements in today's technology are unparalleled. Much of the technology that we have could not have been imagined twenty years ago. One of the latest additions to the list of technological advances is virtual reality. Virtual reality has an …
Viewers will learn how to create a new project within Adobe Encore and the basic layout of the software.
Viewers will learn key ranges in Sampler to make their sampled instruments sound more realistic Gather samples of various notes and drag them to Key Range panel: Set proper root key for each sample: Select all the samples with Command-A (or Ctrl…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question