Download webpage and strip out everything except text

Hi,

Im looking for some code that will download the html of webpage (no images), and then strip out everything except the text. By text I mean the sentences.

Thanks
zattzAsked:
Who is Participating?
 
TNameConnect With a Mentor Commented:
Hi,
a very simple example using TWebBrowser. Will write the text to C:\Test.txt:


uses {...}  SHDocVw, mshtml;

{Main form declaration section}  
private
 procedure DocComplete(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);


{...}

procedure TForm1.Button1Click(Sender: TObject);
var
wb:TWebBrowser;
begin
  wb:= TWebBrowser.Create(nil);
  with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Navigate('www.google.com');
   end;
     while wb.Busy do
        Application.ProcessMessages;
   wb.Free;
end;

procedure TForm1.DocComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
var
 aText:String;
 fs:TFileStream;
 p:Pointer;
begin
  aText:=IHTMLDocument2(TWebBrowser(Sender).Document).Body.innerText;
  fs:=TFileStream.Create('C:\Test.txt',fmCreate);
  p:=pointer(aText);
  fs.Write(p^, Length(aText));
  fs.Free;
end;
0
 
TNameCommented:
And if you don't want the webbrowser to show up at all, you can say:

with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Left:=-500; //<-------------------Just an example. Not so nice, but it works...
0
 
zattzAuthor Commented:
or visible:=false ;)

Thanks for the help
0
 
zattzAuthor Commented:
By the way,

do you know if there is a way to filter out all the links before saving the text?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.