[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 463
  • Last Modified:

Strips Out HTML Tags ?

I'm looking for a code which can strips out all HTML tags from a text. Can anyone show me an example please ? Thanks.
0
phpdotnet
Asked:
phpdotnet
  • 3
  • 2
3 Solutions
 
ravenplCommented:
Can You use http://www.pcre.org/ library?
The regular expression is then
"/<.+?>/",""

string s = "<B>yabba dabba<BR>doo</B>";
pcrecpp::RE("<.+?>").GLobalReplace("", &s);
0
 
phpdotnetAuthor Commented:
Uh oh, thanks but can you provide another solution please :( ? I just want a simple code. May I use regex to get all attributes and remove them ?
0
 
ravenplCommented:
If the library can be ysed, then the code is really simple(like one line). Isn't it?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
ravenplCommented:
Oh yes, and there's small L
pcrecpp::RE("<.+?>").GlobalReplace("", &s);
just tested, one line of code, works for simple example.
0
 
jkrCommented:
//: C03:HTMLStripper.cpp {RunByHand}
//{L} ReplaceAll
// Filter to remove html tags and markers.
#include <cassert>
#include <cmath>
#include <cstddef>
#include <fstream>
#include <iostream>
#include <string>
#include "ReplaceAll.h"
#include "../require.h"
using namespace std;
 
string& stripHTMLTags(string& s) {
  static bool inTag = false;
  bool done = false;
  while(!done) {
    if(inTag) {
      // The previous line started an HTML tag
      // but didn't finish. Must search for '>'.
      size_t rightPos = s.find('>');
      if(rightPos != string::npos) {
        inTag = false;
        s.erase(0, rightPos + 1);
      }
      else {
        done = true;
        s.erase();
      }
    }
    else {
      // Look for start of tag:
      size_t leftPos = s.find('<');
      if(leftPos != string::npos) {
        // See if tag close is in this line:
        size_t rightPos = s.find('>');
        if(rightPos == string::npos) {
          inTag = done = true;
          s.erase(leftPos);
        }
        else
          s.erase(leftPos, rightPos - leftPos + 1);
      }
      else
        done = true;
    }
  }
  // Remove all special HTML characters
  replaceAll(s, "&lt;", "<");
  replaceAll(s, "&gt;", ">");
  replaceAll(s, "&amp;", "&");
  replaceAll(s, "&nbsp;", " ");
  // Etc...
  return s;
}
 
int main(int argc, char* argv[]) {
  requireArgs(argc, 1,
    "usage: HTMLStripper InputFile");
  ifstream in(argv[1]);
  assure(in, argv[1]);
  string s;
  while(getline(in, s))
    if(!stripHTMLTags(s).empty())
      cout << s << endl;
} ///:~
 

This example will even strip HTML tags that span multiple lines. It originates from Bruce Eckel's "Thinking in C++", code taken from http://web.mit.edu/merolish/ticpp/TicV2.html
0
 
phpdotnetAuthor Commented:
Really, It's nice. Thank you very much for your answer. I'm in urgent :)
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now