Solved

regex for preg_replace to delete whatever html tags contain .youtube. in src

Posted on 2011-09-29
9
331 Views
Last Modified: 2012-05-12
regex for preg_replace to delete whatever html tags contain .youtube. (includes the 2 dots) in src.
No matter it is iframe, embed or object.
For <object> tag, the src may even in a <param> tag
Also, the tag could be separated into multiple lines un-ideally with tabs mixed. e.g.
<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>

Thank you very much for all the help!
0
Comment
Question by:candychan611
  • 4
  • 3
  • 2
9 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36814324
What is the output you would expect after processing this string?

<iframe src="http://www.YouTuBe.coM/embed/somevideo"
                        frameborder="0"
   height='150' width="200">
</iframe>
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36814330
It would also be helpful to us if you would post examples of the before-and-after conditions.  Show us what to expect inside a string that looks like what you will be working with.  Given some good test data and the desired output(s) we can probably be very helpful, too!
0
 

Author Comment

by:candychan611
ID: 36814594
Hi Ray,

I would like to complete delete the whole tag.
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 

Author Comment

by:candychan611
ID: 36814637
It is used in Wordpress so the tags should be already balanced. Since the user can edit it freely in code mode when composing a post, I expected they could enter the code upper and lower case mixed / using double/single or even no quotes for the attributes.

However, I don't what other attributes are there. I just see it as a case insensitive string ".youtube." exist between < and >

Hope this helps.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 36817974
This seems to work for me:
$sourcestring = preg_replace('#<\s*([^>\s]+)[^>]*\ssrc\s*=\s*"[^"]*\.youtube\.[^"]*"[^>]*>.*?</\1>#is','',$sourcestring);

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36817981
Note it won't work though when the src is in a separate tag, such as what you describe with an object tag. Can you provide an example of how that might look? We might need to pick those cases up separately.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36818437
Yeah, it all comes back to the quality of the test data.

It would also be helpful to us if you would post examples of the before-and-after conditions.

Make these as close as possible to the real thing - a collection of cut-and-paste strings from the original HTML would be best.  We might be able to do something with REGEX, but in my experience using REGEX to parse HTML can quickly become a fools errand.  Even well-formed HTML sometimes requires a state engine.
0
 

Author Comment

by:candychan611
ID: 36896374
Hi Terry and Ray,

I've finally figured out that the formatting and balance procedure of wordpress will result in double quoted attributes only. Terry is correct to filter only double quotes (src\s*=\s*"[^"]*\.youtube\.[^"]*")!!

It also limited the use of old style embed format such as,
<object width="320" height="264">
  <param name="movie" value="http://www.youtube.com/v/(id)"></param>
  <param name="wmode" value="transparent"></param>
  <embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed>
</object>

Open in new window


as it will produce malformatted code after the formatting and balance procedure.
So, I used code like,

<embed src="http://www.youtube.com/v/(id)" type="application/x-shockwave-flash" height="350" width="425"></embed>

<iframe src="http://www.youtube.com/embed/0(id)" frameborder="0" height="360" width="480"></iframe>

Open in new window


Terry's solution works perfectly!!

Finally, I've extended the regex a bit to become
$pre_regex = '#<\s*([^>\s]+)[^>]*\s(data|movie|src|value)\s*=\s*"[^"]*\.(youtube|tudou)\.[^"]*"[^>]*>.*?</\1>#is';

Open in new window


 to use on the object tag and to filter other sites. Such as,

<object type="application/x-shockwave-flash" style="width: 480px; height: 360px" data="http://www.youtube.com/v/(id)"></object>

<embed src="http://www.TUDOU.com/v/(id)/v.swf" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="opaque" height="400" width="480"></embed>

Open in new window


All works!!
Thank you very much for your time :)
You saved my day!
0
 

Author Comment

by:candychan611
ID: 36896382
A finally note to others,

As Wordpress limited the use of old style embed code, the case that src/data is in a separate tag does not need to be considered.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Technology opened people to different means of presenting information, but PowerPoint remains to be above competition. Know why PPT still works today.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Viewers will learn key ranges in Sampler to make their sampled instruments sound more realistic Gather samples of various notes and drag them to Key Range panel: Set proper root key for each sample: Select all the samples with Command-A (or Ctrl…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question