Solved

reading web page data

Posted on 2003-12-05
11
689 Views
Last Modified: 2010-04-05
When I display a progarm in Netscape 7 I can us the "File/Save pages as.." to save the page contents as eitehr an HTML file or a text file.

In Delphi 7 I can  display a web page using the webbrowser control.

Does anyone know how to save the contents of the page displayed in a web browser to a test file, preferrably in the text file format (ie dropping all the html tags)

Alternately can I down load an html file directly from a site using some other method within Delphi.

Having downloaded a file containing html code - can I strip that back to just the actual text displayed without all the formatting tabs?
0
Comment
Question by:Kymberley
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 22

Accepted Solution

by:
Ferruccio Accalai earned 150 total points
ID: 9881711
uses UrlMon;

procedure TForm1.Button1Click(Sender: TObject);
begin
if URLDownloadToFile(nil, 'http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_20817340.html', 'c:\MyQuestion.txt', 0, nil) <> 0
then
MessageBox(Handle, 'An error ocurred while downloading the file.', PChar
(Application.Title), MB_ICONERROR or MB_OK);
end;
0
 
LVL 26

Assisted Solution

by:EddieShipman
EddieShipman earned 50 total points
ID: 9884823
You can use this to get just the text from the HTML string:

uses..., mshtml;


function RemoveHTMLFromString(const AHTML: string): string;
var
  vDocument : IHTMLDocument2;
  vHTML : OleVariant;
begin
  Result := AHTML;
  vDocument := CoHTMLDocument.Create as IHTMLDocument2;
  vDocument.designMode := 'On';
  vHTML := VarArrayCreate([0, 0], varVariant);
  vHTML[0] := Result;
  vDocument.Write(PSafeArray(TVarData(vHTML).VArray));
  vDocument.Close;
  Result := vDocument.body.outerText;
  vDocument := nil;
end;
0
 
LVL 1

Assisted Solution

by:mgazza
mgazza earned 50 total points
ID: 9906642
why dont we just use wininet stupid 3rd part conponents
oh declair wininet in the uses bit
procedure HTTPDownload(Remote:String; var Data:string);
var create,file_remote_handle:Phandle;
Data_written:cardinal;
buffer:array[0..512] of char;
begin

create := InternetOpen('Mozilla/4.0 (compatible)', INTERNET_OPEN_TYPE_PRECONFIG , NIL, NIL, 0);
file_remote_handle:=InternetOpenUrl(create, pchar(remote), NiL, 0, INTERNET_FLAG_RAW_DATA, 0);
if file_remote_handle<>nil then begin

        repeat
                FillChar(buffer,sizeof(buffer),#0);
                InternetReadFile(file_remote_handle,addr(buffer),sizeof(buffer),Data_Written);
                data:=data+copy(buffer,1,sizeof(buffer));
        until Data_Written<=0;



 internetclosehandle(file_remote_handle);
end
else begin
MessageBox(0,'Could Not Resolve Host!','Error',0);
end;
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Expert Comment

by:mgazza
ID: 9906666
imput a fully valid http address and you get the raw data back erm if u put in a scrpit it reurns the results not the file!
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9906884
mgazza, it will return the HTML text and that is what idHTTP does for him, anyway.
0
 
LVL 1

Expert Comment

by:mgazza
ID: 9906937
ye i know just thats all you need, no web brouser component
0
 

Author Comment

by:Kymberley
ID: 9909656
Thanks for your comments - the URLDownloadToFile method in the first reponse worked so I have whatever other components were required. I wrote my own html stripping routine since the data was all in HTML tables so I was able to built up rows from cells in tab delimited format.

BTW - It's her not him

Kymberley
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9912436
Sorry for the confusion...
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9912441
I have code to get the data from table cells using the DOM if you want it.
0
 

Author Comment

by:Kymberley
ID: 9912573
Thanks for the offer eddie, but i have already written that code - and already downloaded the data i was after from the internet and loaded into my databases. The download method was the crucial hint I needed.
0
 
LVL 1

Expert Comment

by:mgazza
ID: 9912911
good luck all!!!
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question