Solved

regex for preg_replace to delete whatever html tags contain .youtube. in src

Posted on 2011-09-29
9
327 Views
Last Modified: 2012-05-12
regex for preg_replace to delete whatever html tags contain .youtube. (includes the 2 dots) in src.
No matter it is iframe, embed or object.
For <object> tag, the src may even in a <param> tag
Also, the tag could be separated into multiple lines un-ideally with tabs mixed. e.g.
<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>

Thank you very much for all the help!
0
Comment
Question by:candychan611
  • 4
  • 3
  • 2
9 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
What is the output you would expect after processing this string?

<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
It would also be helpful to us if you would post examples of the before-and-after conditions.  Show us what to expect inside a string that looks like what you will be working with.  Given some good test data and the desired output(s) we can probably be very helpful, too!
0
 

Author Comment

by:candychan611
Comment Utility
Hi Ray,

I would like to complete delete the whole tag.
0
 

Author Comment

by:candychan611
Comment Utility
It is used in Wordpress so the tags should be already balanced. Since the user can edit it freely in code mode when composing a post, I expected they could enter the code upper and lower case mixed / using double/single or even no quotes for the attributes.

However, I don't what other attributes are there. I just see it as a case insensitive string ".youtube." exist between < and >

Hope this helps.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
Comment Utility
This seems to work for me:
$sourcestring = preg_replace('#<\s*([^>\s]+)[^>]*\ssrc\s*=\s*"[^"]*\.youtube\.[^"]*"[^>]*>.*?</\1>#is','',$sourcestring);

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Note it won't work though when the src is in a separate tag, such as what you describe with an object tag. Can you provide an example of how that might look? We might need to pick those cases up separately.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Yeah, it all comes back to the quality of the test data.

It would also be helpful to us if you would post examples of the before-and-after conditions.

Make these as close as possible to the real thing - a collection of cut-and-paste strings from the original HTML would be best.  We might be able to do something with REGEX, but in my experience using REGEX to parse HTML can quickly become a fools errand.  Even well-formed HTML sometimes requires a state engine.
0
 

Author Comment

by:candychan611
Comment Utility
Hi Terry and Ray,

I've finally figured out that the formatting and balance procedure of wordpress will result in double quoted attributes only. Terry is correct to filter only double quotes (src\s*=\s*"[^"]*\.youtube\.[^"]*")!!

It also limited the use of old style embed format such as,
<object width="320" height="264">
  <param name="movie" value="http://www.youtube.com/v/(id)"></param>
  <param name="wmode" value="transparent"></param>
  <embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed>
</object>

Open in new window


as it will produce malformatted code after the formatting and balance procedure.
So, I used code like,

<embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" height="350" width="425"></embed>

<iframe src="http://www.youtube.com/embed/0(id)" frameborder="0" height="360" width="480"></iframe>

Open in new window


Terry's solution works perfectly!!

Finally, I've extended the regex a bit to become
$pre_regex = '#<\s*([^>\s]+)[^>]*\s(data|movie|src|value)\s*=\s*"[^"]*\.(youtube|tudou)\.[^"]*"[^>]*>.*?</\1>#is';

Open in new window


 to use on the object tag and to filter other sites. Such as,

<object type="application/x-shockwave-flash" style="width: 480px; height: 360px" data="http://www.youtube.com/v/(id)"></object>

<embed src="http://www.TUDOU.com/v/(id)/v.swf" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="opaque" height="400" width="480"></embed>

Open in new window


All works!!
Thank you very much for your time :)
You saved my day!
0
 

Author Comment

by:candychan611
Comment Utility
A finally note to others,

As Wordpress limited the use of old style embed code, the case that src/data is in a separate tag does not need to be considered.
0

Featured Post

Easy Project Management (No User Manual Required)

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
These days socially coordinated efforts have turned into a critical requirement for enterprises.
This video will demonstrate how to customize windows, tools, and control bars, and save them as screen sets. Open and resize windows: Customize the toolbar: Customize the control bar: Customize your tool selections: Your screen set is alread…
Viewers will learn the basics of creating custom device Racks in Ableton Live. Place instrument(s) and effects onto a track, and select them all by holding the Shift key and clicking on the device title bars: Group them by typing Command-G (Ctrl-G…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now