?
Solved

regex for preg_replace to delete whatever html tags contain .youtube. in src

Posted on 2011-09-29
9
Medium Priority
?
347 Views
Last Modified: 2012-05-12
regex for preg_replace to delete whatever html tags contain .youtube. (includes the 2 dots) in src.
No matter it is iframe, embed or object.
For <object> tag, the src may even in a <param> tag
Also, the tag could be separated into multiple lines un-ideally with tabs mixed. e.g.
<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>

Thank you very much for all the help!
0
Comment
Question by:candychan611
  • 4
  • 3
  • 2
9 Comments
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36814324
What is the output you would expect after processing this string?

<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36814330
It would also be helpful to us if you would post examples of the before-and-after conditions.  Show us what to expect inside a string that looks like what you will be working with.  Given some good test data and the desired output(s) we can probably be very helpful, too!
0
 

Author Comment

by:candychan611
ID: 36814594
Hi Ray,

I would like to complete delete the whole tag.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 

Author Comment

by:candychan611
ID: 36814637
It is used in Wordpress so the tags should be already balanced. Since the user can edit it freely in code mode when composing a post, I expected they could enter the code upper and lower case mixed / using double/single or even no quotes for the attributes.

However, I don't what other attributes are there. I just see it as a case insensitive string ".youtube." exist between < and >

Hope this helps.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 2000 total points
ID: 36817974
This seems to work for me:
$sourcestring = preg_replace('#<\s*([^>\s]+)[^>]*\ssrc\s*=\s*"[^"]*\.youtube\.[^"]*"[^>]*>.*?</\1>#is','',$sourcestring);

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36817981
Note it won't work though when the src is in a separate tag, such as what you describe with an object tag. Can you provide an example of how that might look? We might need to pick those cases up separately.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36818437
Yeah, it all comes back to the quality of the test data.

It would also be helpful to us if you would post examples of the before-and-after conditions.

Make these as close as possible to the real thing - a collection of cut-and-paste strings from the original HTML would be best.  We might be able to do something with REGEX, but in my experience using REGEX to parse HTML can quickly become a fools errand.  Even well-formed HTML sometimes requires a state engine.
0
 

Author Comment

by:candychan611
ID: 36896374
Hi Terry and Ray,

I've finally figured out that the formatting and balance procedure of wordpress will result in double quoted attributes only. Terry is correct to filter only double quotes (src\s*=\s*"[^"]*\.youtube\.[^"]*")!!

It also limited the use of old style embed format such as,
<object width="320" height="264">
  <param name="movie" value="http://www.youtube.com/v/(id)"></param>
  <param name="wmode" value="transparent"></param>
  <embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed>
</object>

Open in new window


as it will produce malformatted code after the formatting and balance procedure.
So, I used code like,

<embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" height="350" width="425"></embed>

<iframe src="http://www.youtube.com/embed/0(id)" frameborder="0" height="360" width="480"></iframe>

Open in new window


Terry's solution works perfectly!!

Finally, I've extended the regex a bit to become
$pre_regex = '#<\s*([^>\s]+)[^>]*\s(data|movie|src|value)\s*=\s*"[^"]*\.(youtube|tudou)\.[^"]*"[^>]*>.*?</\1>#is';

Open in new window


 to use on the object tag and to filter other sites. Such as,

<object type="application/x-shockwave-flash" style="width: 480px; height: 360px" data="http://www.youtube.com/v/(id)"></object>

<embed src="http://www.TUDOU.com/v/(id)/v.swf" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="opaque" height="400" width="480"></embed>

Open in new window


All works!!
Thank you very much for your time :)
You saved my day!
0
 

Author Comment

by:candychan611
ID: 36896382
A finally note to others,

As Wordpress limited the use of old style embed code, the case that src/data is in a separate tag does not need to be considered.
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to count occurrences of each item in an array.
Viewers will learn how to use Macros for greater control over Rack parameters in Ableton Live. Group devices into a Rack by selecting them and pressing Command-G (Ctrl-G on PC): Control-click (Right Click on PC) a parameter to access pop-up menu, …
Suggested Courses

593 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question