Strips Out HTML Tags ?

Posted on 2007-08-02
Last Modified: 2010-05-18
I'm looking for a code which can strips out all HTML tags from a text. Can anyone show me an example please ? Thanks.
Question by:phpdotnet
    LVL 43

    Assisted Solution

    Can You use library?
    The regular expression is then

    string s = "<B>yabba dabba<BR>doo</B>";
    pcrecpp::RE("<.+?>").GLobalReplace("", &s);

    Author Comment

    Uh oh, thanks but can you provide another solution please :( ? I just want a simple code. May I use regex to get all attributes and remove them ?
    LVL 43

    Assisted Solution

    If the library can be ysed, then the code is really simple(like one line). Isn't it?
    LVL 43

    Expert Comment

    Oh yes, and there's small L
    pcrecpp::RE("<.+?>").GlobalReplace("", &s);
    just tested, one line of code, works for simple example.
    LVL 86

    Accepted Solution

    //: C03:HTMLStripper.cpp {RunByHand}
    //{L} ReplaceAll
    // Filter to remove html tags and markers.
    #include <cassert>
    #include <cmath>
    #include <cstddef>
    #include <fstream>
    #include <iostream>
    #include <string>
    #include "ReplaceAll.h"
    #include "../require.h"
    using namespace std;
    string& stripHTMLTags(string& s) {
      static bool inTag = false;
      bool done = false;
      while(!done) {
        if(inTag) {
          // The previous line started an HTML tag
          // but didn't finish. Must search for '>'.
          size_t rightPos = s.find('>');
          if(rightPos != string::npos) {
            inTag = false;
            s.erase(0, rightPos + 1);
          else {
            done = true;
        else {
          // Look for start of tag:
          size_t leftPos = s.find('<');
          if(leftPos != string::npos) {
            // See if tag close is in this line:
            size_t rightPos = s.find('>');
            if(rightPos == string::npos) {
              inTag = done = true;
              s.erase(leftPos, rightPos - leftPos + 1);
            done = true;
      // Remove all special HTML characters
      replaceAll(s, "&lt;", "<");
      replaceAll(s, "&gt;", ">");
      replaceAll(s, "&amp;", "&");
      replaceAll(s, "&nbsp;", " ");
      // Etc...
      return s;
    int main(int argc, char* argv[]) {
      requireArgs(argc, 1,
        "usage: HTMLStripper InputFile");
      ifstream in(argv[1]);
      assure(in, argv[1]);
      string s;
      while(getline(in, s))
          cout << s << endl;
    } ///:~

    This example will even strip HTML tags that span multiple lines. It originates from Bruce Eckel's "Thinking in C++", code taken from

    Author Comment

    Really, It's nice. Thank you very much for your answer. I'm in urgent :)

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
    Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
    The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.
    The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

    737 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now