Solved

reading web page data

Posted on 2003-12-05
11
684 Views
Last Modified: 2010-04-05
When I display a progarm in Netscape 7 I can us the "File/Save pages as.." to save the page contents as eitehr an HTML file or a text file.

In Delphi 7 I can  display a web page using the webbrowser control.

Does anyone know how to save the contents of the page displayed in a web browser to a test file, preferrably in the text file format (ie dropping all the html tags)

Alternately can I down load an html file directly from a site using some other method within Delphi.

Having downloaded a file containing html code - can I strip that back to just the actual text displayed without all the formatting tabs?
0
Comment
Question by:Kymberley
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 22

Accepted Solution

by:
Ferruccio Accalai earned 150 total points
ID: 9881711
uses UrlMon;

procedure TForm1.Button1Click(Sender: TObject);
begin
if URLDownloadToFile(nil, 'http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_20817340.html', 'c:\MyQuestion.txt', 0, nil) <> 0
then
MessageBox(Handle, 'An error ocurred while downloading the file.', PChar
(Application.Title), MB_ICONERROR or MB_OK);
end;
0
 
LVL 26

Assisted Solution

by:EddieShipman
EddieShipman earned 50 total points
ID: 9884823
You can use this to get just the text from the HTML string:

uses..., mshtml;


function RemoveHTMLFromString(const AHTML: string): string;
var
  vDocument : IHTMLDocument2;
  vHTML : OleVariant;
begin
  Result := AHTML;
  vDocument := CoHTMLDocument.Create as IHTMLDocument2;
  vDocument.designMode := 'On';
  vHTML := VarArrayCreate([0, 0], varVariant);
  vHTML[0] := Result;
  vDocument.Write(PSafeArray(TVarData(vHTML).VArray));
  vDocument.Close;
  Result := vDocument.body.outerText;
  vDocument := nil;
end;
0
 
LVL 1

Assisted Solution

by:mgazza
mgazza earned 50 total points
ID: 9906642
why dont we just use wininet stupid 3rd part conponents
oh declair wininet in the uses bit
procedure HTTPDownload(Remote:String; var Data:string);
var create,file_remote_handle:Phandle;
Data_written:cardinal;
buffer:array[0..512] of char;
begin

create := InternetOpen('Mozilla/4.0 (compatible)', INTERNET_OPEN_TYPE_PRECONFIG , NIL, NIL, 0);
file_remote_handle:=InternetOpenUrl(create, pchar(remote), NiL, 0, INTERNET_FLAG_RAW_DATA, 0);
if file_remote_handle<>nil then begin

        repeat
                FillChar(buffer,sizeof(buffer),#0);
                InternetReadFile(file_remote_handle,addr(buffer),sizeof(buffer),Data_Written);
                data:=data+copy(buffer,1,sizeof(buffer));
        until Data_Written<=0;



 internetclosehandle(file_remote_handle);
end
else begin
MessageBox(0,'Could Not Resolve Host!','Error',0);
end;
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 
LVL 1

Expert Comment

by:mgazza
ID: 9906666
imput a fully valid http address and you get the raw data back erm if u put in a scrpit it reurns the results not the file!
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9906884
mgazza, it will return the HTML text and that is what idHTTP does for him, anyway.
0
 
LVL 1

Expert Comment

by:mgazza
ID: 9906937
ye i know just thats all you need, no web brouser component
0
 

Author Comment

by:Kymberley
ID: 9909656
Thanks for your comments - the URLDownloadToFile method in the first reponse worked so I have whatever other components were required. I wrote my own html stripping routine since the data was all in HTML tables so I was able to built up rows from cells in tab delimited format.

BTW - It's her not him

Kymberley
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9912436
Sorry for the confusion...
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9912441
I have code to get the data from table cells using the DOM if you want it.
0
 

Author Comment

by:Kymberley
ID: 9912573
Thanks for the offer eddie, but i have already written that code - and already downloaded the data i was after from the internet and loaded into my databases. The download method was the crucial hint I needed.
0
 
LVL 1

Expert Comment

by:mgazza
ID: 9912911
good luck all!!!
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question