Solved

Download webpage and strip out everything except text

Posted on 2006-11-21
5
181 Views
Last Modified: 2010-04-05
Hi,

Im looking for some code that will download the html of webpage (no images), and then strip out everything except the text. By text I mean the sentences.

Thanks
0
Comment
Question by:zattz
  • 3
  • 2
5 Comments
 
LVL 28

Accepted Solution

by:
TName earned 500 total points
ID: 17985952
Hi,
a very simple example using TWebBrowser. Will write the text to C:\Test.txt:


uses {...}  SHDocVw, mshtml;

{Main form declaration section}  
private
 procedure DocComplete(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);


{...}

procedure TForm1.Button1Click(Sender: TObject);
var
wb:TWebBrowser;
begin
  wb:= TWebBrowser.Create(nil);
  with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Navigate('www.google.com');
   end;
     while wb.Busy do
        Application.ProcessMessages;
   wb.Free;
end;

procedure TForm1.DocComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
var
 aText:String;
 fs:TFileStream;
 p:Pointer;
begin
  aText:=IHTMLDocument2(TWebBrowser(Sender).Document).Body.innerText;
  fs:=TFileStream.Create('C:\Test.txt',fmCreate);
  p:=pointer(aText);
  fs.Write(p^, Length(aText));
  fs.Free;
end;
0
 
LVL 28

Expert Comment

by:TName
ID: 17985965
And if you don't want the webbrowser to show up at all, you can say:

with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Left:=-500; //<-------------------Just an example. Not so nice, but it works...
0
 

Author Comment

by:zattz
ID: 17986172
or visible:=false ;)

Thanks for the help
0
 

Author Comment

by:zattz
ID: 18005842
By the way,

do you know if there is a way to filter out all the links before saving the text?
0
 

Author Comment

by:zattz
ID: 18005851
0

Featured Post

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question