remove CSS and HTML tags

Posted on 2011-04-22
Last Modified: 2012-05-11
Hello. Can anyone suggest an effective way to remove CSS and HTML tags from pages as they are being scraped?
Girl in the Bath, Peninsula Hotel, Tokyo | Surly Bastard
div#fancy_inner {border-color:#BBBBBB}
div#fancy_close {right:-15px;top:-12px}
div#fancy_bg {background-color:#FFFFFF}
.wp-rotator-wrap {
padding: 0; margin: 0;
.wp-rotator-wrap .pane {
height: 300px;
width: 400px;
overflow: hidden;
position: relative;
padding: 0px;
margin: 0px;
.wp-rotator-wrap .elements {
height: 300px;
padding: 0px;
margin: 0px;
.wp-rotator-wrap .featured-cell {
width: 400px;
height: 300px;
display: block;
position: absolute;
top: 0;
left: 0;
margin: 0px;
padding: 0px;
.wp-rotator-wrap .featured-cell .image {
position: absolute;
top: 0;
left: 0;
.wp-rotator-wrap .featured-cell .info {
position: absolute;
left: 0;
bottom: 0px;
width: 400px;
height: 50px;
padding: 8px 8px;
overflow: hidden;
background: url( transparent;
color: #ddd;  
.wp-rotator-wrap .featured-cell .info h1 {
margin: 0;
padding: 0;
font-size: 15px;
color: #CCD;
.wp-rotator-wrap .current-cell { z-index: 500; }
Surly Bastard<br>
You annoy me.
Girl in the Bath, Peninsula Hotel, Tokyo<br>
I forget who she was.
Recent entries
Me by Andreas<br>
For Japan<br>
Lil Bastard<br>
Log in<br>
Recent comments
Suggest Ideas<br>
Support Forum<br>
WordPress Blog<br>
WordPress Planet<br>
My Shit<br> (89)
Shit I like<br> (10)
Things I didnt create<br> (10)
Things that dont suck<br> (8)
Select month
April 2011  (3)
March 2011  (8)
February 2011  (5)
January 2011  (29)
December 2010  (26)
November 2010  (28)
Tag Cloud
All photos © Jim O'Connell, unless otherwise noted. 

Open in new window

Question by:onyourmark
    LVL 107

    Accepted Solution

    strip_tags() is your friend.

    If you want to post a sample of the inputs and your expected outputs, we might be able to give you more concrete assistance.  Best regards, ~Ray
    LVL 12

    Assisted Solution

    by:Mohamed Abowarda
    strip_tags() is a default HTML Tag stripper in PHP. However, it can't strip some of the tags, so this enhanced version called strip_html_tags will remove more html elements.

    How to strip HTML tags, scripts, and styles from a web page:
     * Remove HTML tags, including invisible text such as style and
     * script code, and embedded objects.  Add line breaks around
     * block-level tags to prevent word joining after tag removal.
    function strip_html_tags( $text )
        $text = preg_replace(
              // Remove invisible content
              // Add line breaks before and after blocks
                ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
                "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0",
                "\n\$0", "\n\$0",
            $text );
        return strip_tags( $text );

    Open in new window

    LVL 107

    Assisted Solution

    by:Ray Paseur
    can't strip some of the tags

    Which ones?  Have you got an example showing how it fails?  While it's a fairly brittle function, I think it works well if you have valid HTML.

    Featured Post

    Live: Real-Time Solutions, Start Here

    Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

    Join & Write a Comment

    Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
    Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
    The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
    This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    21 Experts available now in Live!

    Get 1:1 Help Now