Strip HTML tags - reliably

Hi,

I am making a little freeware app that requires working with an HTML page's plain text. I've found some code to do simple tag stripping (the same I was thinking, just remove everything between '<' and '>'), but I fear this may not be very reliable.

So my question is,
Should I just take out everything between tag openers and closers, or is there a more *intelligent solution for stripping all code from an HTML page and leaving only the readable text?

Any ideas are welcome. Thank you.
LVL 14
EsopoAsked:
Who is Participating?
 
esoftbgConnect With a Mentor Commented:
JEDI JVCL 3.0 contains JvHTMLParser
0
 
Mohammed NasmanConnect With a Mentor Software DeveloperCommented:
0
 
Eddie ShipmanConnect With a Mentor All-around developerCommented:
Use the DOM like this:

uses  ...,mshtml, ActiveX, ComObj;

procedure TForm1.Button1Click(Sender: TObject);
var
  IDoc:      IHTMLDocument2;
  Strl:      TStringList;
  sHTMLFile: String;
  v:         Variant;
  Links:     IHTMLElementCollection;
  i:         Integer;
  Link:    IHTMLAnchorElement;
begin
  if OpenDialog1.Execute then
  begin
    sHTMLFile := OpenDialog1.FileName;
    Strl := TStringList.Create;
    try
      Strl.LoadFromFile(sHTMLFile);
      Idoc:=CreateComObject(Class_HTMLDOcument) as IHTMLDocument2;
      try
        IDoc.designMode:='on';
        while IDoc.readyState<>'complete' do
          Application.ProcessMessages;
        v:=VarArrayCreate([0,0],VarVariant);
        v[0]:= Strl.Text;
        IDoc.write(PSafeArray(System.TVarData(v).VArray));
        IDoc.designMode:='off';
        while IDoc.readyState<>'complete' do
          Application.ProcessMessages;
        Memo1.Lines.Text := IDoc.body.innerText;
      finally
        IDoc := nil;
      end;
    finally
      Strl.Free;
    end;
  end;
end;
0
 
EsopoAuthor Commented:
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.