candychan611
asked on
regex for preg_replace to delete whatever html tags contain .youtube. in src
regex for preg_replace to delete whatever html tags contain .youtube. (includes the 2 dots) in src.
No matter it is iframe, embed or object.
For <object> tag, the src may even in a <param> tag
Also, the tag could be separated into multiple lines un-ideally with tabs mixed. e.g.
<iframe src="http://www.YouTuBe.coM/embed/somevideo"
frameborder="0"
height='150' width="200">
</iframe>
Thank you very much for all the help!
No matter it is iframe, embed or object.
For <object> tag, the src may even in a <param> tag
Also, the tag could be separated into multiple lines un-ideally with tabs mixed. e.g.
<iframe src="http://www.YouTuBe.coM/embed/somevideo"
frameborder="0"
height='150' width="200">
</iframe>
Thank you very much for all the help!
It would also be helpful to us if you would post examples of the before-and-after conditions. Show us what to expect inside a string that looks like what you will be working with. Given some good test data and the desired output(s) we can probably be very helpful, too!
ASKER
Hi Ray,
I would like to complete delete the whole tag.
I would like to complete delete the whole tag.
ASKER
It is used in Wordpress so the tags should be already balanced. Since the user can edit it freely in code mode when composing a post, I expected they could enter the code upper and lower case mixed / using double/single or even no quotes for the attributes.
However, I don't what other attributes are there. I just see it as a case insensitive string ".youtube." exist between < and >
Hope this helps.
However, I don't what other attributes are there. I just see it as a case insensitive string ".youtube." exist between < and >
Hope this helps.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Note it won't work though when the src is in a separate tag, such as what you describe with an object tag. Can you provide an example of how that might look? We might need to pick those cases up separately.
Yeah, it all comes back to the quality of the test data.
It would also be helpful to us if you would post examples of the before-and-after conditions.
Make these as close as possible to the real thing - a collection of cut-and-paste strings from the original HTML would be best. We might be able to do something with REGEX, but in my experience using REGEX to parse HTML can quickly become a fools errand. Even well-formed HTML sometimes requires a state engine.
It would also be helpful to us if you would post examples of the before-and-after conditions.
Make these as close as possible to the real thing - a collection of cut-and-paste strings from the original HTML would be best. We might be able to do something with REGEX, but in my experience using REGEX to parse HTML can quickly become a fools errand. Even well-formed HTML sometimes requires a state engine.
ASKER
Hi Terry and Ray,
I've finally figured out that the formatting and balance procedure of wordpress will result in double quoted attributes only. Terry is correct to filter only double quotes (src\s*=\s*"[^"]*\.youtube \.[^"]*")! !
It also limited the use of old style embed format such as,
as it will produce malformatted code after the formatting and balance procedure.
So, I used code like,
Terry's solution works perfectly!!
Finally, I've extended the regex a bit to become
to use on the object tag and to filter other sites. Such as,
All works!!
Thank you very much for your time :)
You saved my day!
I've finally figured out that the formatting and balance procedure of wordpress will result in double quoted attributes only. Terry is correct to filter only double quotes (src\s*=\s*"[^"]*\.youtube
It also limited the use of old style embed format such as,
<object width="320" height="264">
<param name="movie" value="http://www.youtube.com/v/(id)"></param>
<param name="wmode" value="transparent"></param>
<embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed>
</object>
as it will produce malformatted code after the formatting and balance procedure.
So, I used code like,
<embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" height="350" width="425"></embed>
<iframe src="http://www.youtube.com/embed/0(id)" frameborder="0" height="360" width="480"></iframe>
Terry's solution works perfectly!!
Finally, I've extended the regex a bit to become
$pre_regex = '#<\s*([^>\s]+)[^>]*\s(data|movie|src|value)\s*=\s*"[^"]*\.(youtube|tudou)\.[^"]*"[^>]*>.*?</\1>#is';
to use on the object tag and to filter other sites. Such as,
<object type="application/x-shockwave-flash" style="width: 480px; height: 360px" data="http://www.youtube.com/v/(id)"></object>
<embed src="http://www.TUDOU.com/v/(id)/v.swf" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="opaque" height="400" width="480"></embed>
All works!!
Thank you very much for your time :)
You saved my day!
ASKER
A finally note to others,
As Wordpress limited the use of old style embed code, the case that src/data is in a separate tag does not need to be considered.
As Wordpress limited the use of old style embed code, the case that src/data is in a separate tag does not need to be considered.
<iframe src="http://www.YouTuBe.coM/embed/somevideo"
frameborder="0"
height='150' width="200">
</iframe>