Solved

reading web page data

Posted on 2003-12-05
11
679 Views
Last Modified: 2010-04-05
When I display a progarm in Netscape 7 I can us the "File/Save pages as.." to save the page contents as eitehr an HTML file or a text file.

In Delphi 7 I can  display a web page using the webbrowser control.

Does anyone know how to save the contents of the page displayed in a web browser to a test file, preferrably in the text file format (ie dropping all the html tags)

Alternately can I down load an html file directly from a site using some other method within Delphi.

Having downloaded a file containing html code - can I strip that back to just the actual text displayed without all the formatting tabs?
0
Comment
Question by:Kymberley
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 22

Accepted Solution

by:
Ferruccio Accalai earned 150 total points
ID: 9881711
uses UrlMon;

procedure TForm1.Button1Click(Sender: TObject);
begin
if URLDownloadToFile(nil, 'http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_20817340.html', 'c:\MyQuestion.txt', 0, nil) <> 0
then
MessageBox(Handle, 'An error ocurred while downloading the file.', PChar
(Application.Title), MB_ICONERROR or MB_OK);
end;
0
 
LVL 26

Assisted Solution

by:EddieShipman
EddieShipman earned 50 total points
ID: 9884823
You can use this to get just the text from the HTML string:

uses..., mshtml;


function RemoveHTMLFromString(const AHTML: string): string;
var
  vDocument : IHTMLDocument2;
  vHTML : OleVariant;
begin
  Result := AHTML;
  vDocument := CoHTMLDocument.Create as IHTMLDocument2;
  vDocument.designMode := 'On';
  vHTML := VarArrayCreate([0, 0], varVariant);
  vHTML[0] := Result;
  vDocument.Write(PSafeArray(TVarData(vHTML).VArray));
  vDocument.Close;
  Result := vDocument.body.outerText;
  vDocument := nil;
end;
0
 
LVL 1

Assisted Solution

by:mgazza
mgazza earned 50 total points
ID: 9906642
why dont we just use wininet stupid 3rd part conponents
oh declair wininet in the uses bit
procedure HTTPDownload(Remote:String; var Data:string);
var create,file_remote_handle:Phandle;
Data_written:cardinal;
buffer:array[0..512] of char;
begin

create := InternetOpen('Mozilla/4.0 (compatible)', INTERNET_OPEN_TYPE_PRECONFIG , NIL, NIL, 0);
file_remote_handle:=InternetOpenUrl(create, pchar(remote), NiL, 0, INTERNET_FLAG_RAW_DATA, 0);
if file_remote_handle<>nil then begin

        repeat
                FillChar(buffer,sizeof(buffer),#0);
                InternetReadFile(file_remote_handle,addr(buffer),sizeof(buffer),Data_Written);
                data:=data+copy(buffer,1,sizeof(buffer));
        until Data_Written<=0;



 internetclosehandle(file_remote_handle);
end
else begin
MessageBox(0,'Could Not Resolve Host!','Error',0);
end;
0
 
LVL 1

Expert Comment

by:mgazza
ID: 9906666
imput a fully valid http address and you get the raw data back erm if u put in a scrpit it reurns the results not the file!
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9906884
mgazza, it will return the HTML text and that is what idHTTP does for him, anyway.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 1

Expert Comment

by:mgazza
ID: 9906937
ye i know just thats all you need, no web brouser component
0
 

Author Comment

by:Kymberley
ID: 9909656
Thanks for your comments - the URLDownloadToFile method in the first reponse worked so I have whatever other components were required. I wrote my own html stripping routine since the data was all in HTML tables so I was able to built up rows from cells in tab delimited format.

BTW - It's her not him

Kymberley
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9912436
Sorry for the confusion...
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9912441
I have code to get the data from table cells using the DOM if you want it.
0
 

Author Comment

by:Kymberley
ID: 9912573
Thanks for the offer eddie, but i have already written that code - and already downloaded the data i was after from the internet and loaded into my databases. The download method was the crucial hint I needed.
0
 
LVL 1

Expert Comment

by:mgazza
ID: 9912911
good luck all!!!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now