?
Solved

Download webpage and strip out everything except text

Posted on 2006-11-21
5
Medium Priority
?
189 Views
Last Modified: 2010-04-05
Hi,

Im looking for some code that will download the html of webpage (no images), and then strip out everything except the text. By text I mean the sentences.

Thanks
0
Comment
Question by:zattz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 28

Accepted Solution

by:
TName earned 2000 total points
ID: 17985952
Hi,
a very simple example using TWebBrowser. Will write the text to C:\Test.txt:


uses {...}  SHDocVw, mshtml;

{Main form declaration section}  
private
 procedure DocComplete(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);


{...}

procedure TForm1.Button1Click(Sender: TObject);
var
wb:TWebBrowser;
begin
  wb:= TWebBrowser.Create(nil);
  with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Navigate('www.google.com');
   end;
     while wb.Busy do
        Application.ProcessMessages;
   wb.Free;
end;

procedure TForm1.DocComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
var
 aText:String;
 fs:TFileStream;
 p:Pointer;
begin
  aText:=IHTMLDocument2(TWebBrowser(Sender).Document).Body.innerText;
  fs:=TFileStream.Create('C:\Test.txt',fmCreate);
  p:=pointer(aText);
  fs.Write(p^, Length(aText));
  fs.Free;
end;
0
 
LVL 28

Expert Comment

by:TName
ID: 17985965
And if you don't want the webbrowser to show up at all, you can say:

with wb do begin
     OnDocumentComplete:=DocComplete;
     ParentWindow:=Self.Handle;
     Left:=-500; //<-------------------Just an example. Not so nice, but it works...
0
 

Author Comment

by:zattz
ID: 17986172
or visible:=false ;)

Thanks for the help
0
 

Author Comment

by:zattz
ID: 18005842
By the way,

do you know if there is a way to filter out all the links before saving the text?
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Objective: - This article will help user in how to convert their numeric value become words. How to use 1. You can copy this code in your Unit as function 2. than you can perform your function by type this code The Code   (CODE) The Im…
Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question