[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 827
  • Last Modified:

Strip HTML tags - reliably

Hi,

I am making a little freeware app that requires working with an HTML page's plain text. I've found some code to do simple tag stripping (the same I was thinking, just remove everything between '<' and '>'), but I fear this may not be very reliable.

So my question is,
Should I just take out everything between tag openers and closers, or is there a more *intelligent solution for stripping all code from an HTML page and leaving only the readable text?

Any ideas are welcome. Thank you.
0
Esopo
Asked:
Esopo
3 Solutions
 
Mohammed NasmanSoftware DeveloperCommented:
0
 
esoftbgCommented:
JEDI JVCL 3.0 contains JvHTMLParser
0
 
Eddie ShipmanAll-around developerCommented:
Use the DOM like this:

uses  ...,mshtml, ActiveX, ComObj;

procedure TForm1.Button1Click(Sender: TObject);
var
  IDoc:      IHTMLDocument2;
  Strl:      TStringList;
  sHTMLFile: String;
  v:         Variant;
  Links:     IHTMLElementCollection;
  i:         Integer;
  Link:    IHTMLAnchorElement;
begin
  if OpenDialog1.Execute then
  begin
    sHTMLFile := OpenDialog1.FileName;
    Strl := TStringList.Create;
    try
      Strl.LoadFromFile(sHTMLFile);
      Idoc:=CreateComObject(Class_HTMLDOcument) as IHTMLDocument2;
      try
        IDoc.designMode:='on';
        while IDoc.readyState<>'complete' do
          Application.ProcessMessages;
        v:=VarArrayCreate([0,0],VarVariant);
        v[0]:= Strl.Text;
        IDoc.write(PSafeArray(System.TVarData(v).VArray));
        IDoc.designMode:='off';
        while IDoc.readyState<>'complete' do
          Application.ProcessMessages;
        Memo1.Lines.Text := IDoc.body.innerText;
      finally
        IDoc := nil;
      end;
    finally
      Strl.Free;
    end;
  end;
end;
0
 
EsopoAuthor Commented:
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now