Need regex: find img tags in a string

Posted on 2007-08-11
Last Modified: 2010-08-05
I have a generated HTML string where I want to strip out all HTML tags except some special ones, so I am looking for a regular expression that can help me.

So my current idea is:
1) use a string replace to make (example) <img .../> tags to [img.../]
2) use a function like stripHTML to get rid of all HTML tags
3) convert the [img.../] back into <img.../>

So I need a query that finds <img...> and <img ... /> (with and without an "/" at the end).
And the same query for [img...]. The second one needs, as far as I know, some escapes, this is why I am not sure if it can be done easily.
Question by:Smoerble
    LVL 62

    Expert Comment

    by:Fernando Soto
    Can you post a short file/excerpt of a file showing exactly what you want replaced with what.

    Seeming not all regular expressions used by a programming language are not all built the same way what programming language are you needing this for, PHP, C# or other?
    LVL 84

    Expert Comment

    if you don't have things like
    <IMG SRC = "foo.gif" ALT = "A > B">

    LVL 62

    Expert Comment

    by:Fernando Soto
    Not sure what you want, sorry.
    LVL 4

    Accepted Solution

    In perl, use a slightly modified version of ozo's code to handle the img tag translation.  I have added the second line to remove all html tags (or anything that looks line one).  My perl is a bit rusty though, so I've added a PHP version below.

    // PERL

    $html =~ s/<(img\b.*?)>/[$1]/xsgi;
    $html =~ s/<(.*?)>//xsgi;
    $html =~ s/\[(img\b.*?)\]/<$1>/xsgi;

    // PHP

    $html = preg_replace("/<(img .*?)>/","[$1]",$html);
    $html = preg_replace("/<(.*?)>/","",$html);
    $html = preg_replace("/\[(img .*?)\]/","<$1>",$html);
    LVL 14

    Assisted Solution

    <[iI][mM][gG][^>]*> will match <img...> and <img.../>
     if you find use that to find all the starting and ending positions of all the img tags in the string you can then replace the starting character with [ and the endpostion caracter with ]

    Author Comment

    @jonathanmelnick and SBennett:
    Thanks, perfect solutions!

    Featured Post

    Highfive Gives IT Their Time Back

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Join & Write a Comment

    I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
    As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
    Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    22 Experts available now in Live!

    Get 1:1 Help Now