Regex to remove <img>

Posted on 2006-04-25
Last Modified: 2012-08-17
Hi everybody. This question might be a little bit more than what it seems. I need a regular expression to remove HTML image tags. Sure. Easy enough. Why not something like this, right?
$html = preg_replace('/<img[^>]*>/i', '', $html);

Except I was reading a website that pointed out to me the possibility that there's a greater-than sign (>) in the alt-attribute. And in that case, the command above would change something like this:

<img src="next.jpg" alt=">">

Into this:


But clearly, I'd like it to delete all of that with one fell-swoop. So I'm looking for a similar regular expression replacement to accomodate for greater-than signs in the alt attribute (but remember, there's always a possibility that the alt-attribute isn't even there to begin with). This has probably already been addressed somewhere, but I had a few extra points to give away. Thanks.
Question by:soapergem
    LVL 6

    Author Comment

    And one more thing, please also remember that even if the alt attribute is there, it isn't necessarily enclosed in double quotes. Could be single quotes, could be no quotes, could be some invalid combination of single and double quotes. Thanks again.
    LVL 9

    Accepted Solution

    try this

    *ps. not yet tested, try to enchance the regxp with ignore case

    LVL 6

    Author Comment

    I tested that on the same example I used above in the question and it still exhibited the same behavior (leaving the extra ">). And yes, I remembered to add the extra / on the end that you omitted. ;) (plus the letter i for case insensistive). So unfortunately, my tests show that the very complex and well-thought-out expression you came up with don't quite do it. But that's a lot more complex of an expression than I can come up with, maybe you're on the right track. (I wouldn't really know, or rather, I don't really want to take the time to comprehend all of that expression.)

    But one more comment: theoretically speaking, the "src" attribute wouldn't have to be there. Obviously you don't get an image without it, so omitting it would do nobody any good, but it seems like your expression requires that it be there.
    LVL 6

    Author Comment

    Nevermind...I figured it out myself. This works:


    Thanks for your honest attempt, though, blue_hunter. I'm in a good mood, so I'll give you the points anyway.
    LVL 6

    Author Comment

    And now I *really* found a good answer, probably a better one, thanks to this:
    LVL 6

    Author Comment

    So this is probably the most reliable, according to that article:
    LVL 9

    Expert Comment

    i have back to view again this question, thanks for the points.
    this is a pretty cool regular expression( in your latest post)

    I had left quite lots of attribute of <img>, thanks for remind me.



    Expert Comment

    Thank you, That actually worked.
    I got what I wanted, how ever its really slow.

    this is what I done:
                wb1 = new WebBrowser();
                wb1.ScrollBarsEnabled = false;
                wb1.ScriptErrorsSuppressed = true;
                wb1.DocumentText = source;
                wb1.AllowNavigation = false;
                while(wb1.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); }
                var collection = wb1.Document.GetElementsByTagName("a");

    Open in new window

    The source parameter I set as DocumentText is downloaded with WebClient and before setting it, I remove all image-tags and <link rel stylesheets> tags to speed it up with regex.

    I also tried running this in multi-threading mode but that didn't the webbrowser control like. I now start the application multiple times instead.

    Not a perfect solution, but biggest problem is the speed.

    If someone have an better approach for me I would very much take a look at that.

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    This is a general how to create your own custom plugin system for your PHP application that you designed (or wish to extend a third party program to have plugin functionality that doesn't have it yet).  This is not how to make plugins for existing s…
    Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
    This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
    The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

    732 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now