reading web page data

When I display a progarm in Netscape 7 I can us the "File/Save pages as.." to save the page contents as eitehr an HTML file or a text file.

In Delphi 7 I can  display a web page using the webbrowser control.

Does anyone know how to save the contents of the page displayed in a web browser to a test file, preferrably in the text file format (ie dropping all the html tags)

Alternately can I down load an html file directly from a site using some other method within Delphi.

Having downloaded a file containing html code - can I strip that back to just the actual text displayed without all the formatting tabs?
KymberleyAsked:
Who is Participating?
 
Ferruccio AccalaiSenior developer, analyst and customer assistance Commented:
uses UrlMon;

procedure TForm1.Button1Click(Sender: TObject);
begin
if URLDownloadToFile(nil, 'http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_20817340.html', 'c:\MyQuestion.txt', 0, nil) <> 0
then
MessageBox(Handle, 'An error ocurred while downloading the file.', PChar
(Application.Title), MB_ICONERROR or MB_OK);
end;
0
 
Eddie ShipmanAll-around developerCommented:
You can use this to get just the text from the HTML string:

uses..., mshtml;


function RemoveHTMLFromString(const AHTML: string): string;
var
  vDocument : IHTMLDocument2;
  vHTML : OleVariant;
begin
  Result := AHTML;
  vDocument := CoHTMLDocument.Create as IHTMLDocument2;
  vDocument.designMode := 'On';
  vHTML := VarArrayCreate([0, 0], varVariant);
  vHTML[0] := Result;
  vDocument.Write(PSafeArray(TVarData(vHTML).VArray));
  vDocument.Close;
  Result := vDocument.body.outerText;
  vDocument := nil;
end;
0
 
mgazzaCommented:
why dont we just use wininet stupid 3rd part conponents
oh declair wininet in the uses bit
procedure HTTPDownload(Remote:String; var Data:string);
var create,file_remote_handle:Phandle;
Data_written:cardinal;
buffer:array[0..512] of char;
begin

create := InternetOpen('Mozilla/4.0 (compatible)', INTERNET_OPEN_TYPE_PRECONFIG , NIL, NIL, 0);
file_remote_handle:=InternetOpenUrl(create, pchar(remote), NiL, 0, INTERNET_FLAG_RAW_DATA, 0);
if file_remote_handle<>nil then begin

        repeat
                FillChar(buffer,sizeof(buffer),#0);
                InternetReadFile(file_remote_handle,addr(buffer),sizeof(buffer),Data_Written);
                data:=data+copy(buffer,1,sizeof(buffer));
        until Data_Written<=0;



 internetclosehandle(file_remote_handle);
end
else begin
MessageBox(0,'Could Not Resolve Host!','Error',0);
end;
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
mgazzaCommented:
imput a fully valid http address and you get the raw data back erm if u put in a scrpit it reurns the results not the file!
0
 
Eddie ShipmanAll-around developerCommented:
mgazza, it will return the HTML text and that is what idHTTP does for him, anyway.
0
 
mgazzaCommented:
ye i know just thats all you need, no web brouser component
0
 
KymberleyAuthor Commented:
Thanks for your comments - the URLDownloadToFile method in the first reponse worked so I have whatever other components were required. I wrote my own html stripping routine since the data was all in HTML tables so I was able to built up rows from cells in tab delimited format.

BTW - It's her not him

Kymberley
0
 
Eddie ShipmanAll-around developerCommented:
Sorry for the confusion...
0
 
Eddie ShipmanAll-around developerCommented:
I have code to get the data from table cells using the DOM if you want it.
0
 
KymberleyAuthor Commented:
Thanks for the offer eddie, but i have already written that code - and already downloaded the data i was after from the internet and loaded into my databases. The download method was the crucial hint I needed.
0
 
mgazzaCommented:
good luck all!!!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.