Solved

Download webpage and strip out everything except text

Posted on 2006-11-21
5
187 Views
Last Modified: 2010-04-05
Hi,

Im looking for some code that will download the html of webpage (no images), and then strip out everything except the text. By text I mean the sentences.

Thanks
0
Comment
Question by:zattz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 28

Accepted Solution

by:
TName earned 500 total points
ID: 17985952
Hi,
a very simple example using TWebBrowser. Will write the text to C:\Test.txt:


uses {...}  SHDocVw, mshtml;

{Main form declaration section}  
private
 procedure DocComplete(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);


{...}

procedure TForm1.Button1Click(Sender: TObject);
var
wb:TWebBrowser;
begin
  wb:= TWebBrowser.Create(nil);
  with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Navigate('www.google.com');
   end;
     while wb.Busy do
        Application.ProcessMessages;
   wb.Free;
end;

procedure TForm1.DocComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
var
 aText:String;
 fs:TFileStream;
 p:Pointer;
begin
  aText:=IHTMLDocument2(TWebBrowser(Sender).Document).Body.innerText;
  fs:=TFileStream.Create('C:\Test.txt',fmCreate);
  p:=pointer(aText);
  fs.Write(p^, Length(aText));
  fs.Free;
end;
0
 
LVL 28

Expert Comment

by:TName
ID: 17985965
And if you don't want the webbrowser to show up at all, you can say:

with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Left:=-500; //<-------------------Just an example. Not so nice, but it works...
0
 

Author Comment

by:zattz
ID: 17986172
or visible:=false ;)

Thanks for the help
0
 

Author Comment

by:zattz
ID: 18005842
By the way,

do you know if there is a way to filter out all the links before saving the text?
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question