Link to home
Start Free TrialLog in
Avatar of anthonyfaunt
anthonyfaunt

asked on

Accessing Text in a TWebBrowser Document

I am using a TWebBrowser control in Delphi 5 Enterprise.
Once I have navigated to a Web site, How can I access the text in the downloaded HTML document?
Win32 Developer's Reference Help advises that:
For more information about this interface, see the document object section of Microsoft’s Dynamic HTML reference.
I have not had any luck following this suggestion.
Avatar of rwilson032697
rwilson032697

listening...
Avatar of anthonyfaunt

ASKER

Hi rwilson, do you want more info?
Edited text of question.
Avatar of RBertora
Listening...

Generally means that the expert is interrested in following the question, but does not have much to contribute at this point in time.

Unfortunately not many of us have D5 enterprise.. so I for one can't help I'm afraid.
Rob ;-)
Following...®
Hi
what do you want to do with the text?
Regards Barry
Hi Barry
I want to extract data from the text.  I am presently doing this by saving the HTML Source as a text file and then processing the text file to extract the HTML formatting code and leave the data that the page displays.
By being able to access the text in the Document (property of the TWebBrowser control) I will be able to extract the data more seamlessly.
Regards,
Tony
Hi
Maybe these examples will give you some hints on using the Dom ...
 
procedure TWebForm.DumpHtmlToBrowser(HTML: String);
{this routine writes a html-string directly to the WebBrowser}
var
  Document: IHtmlDocument2;
  v:Variant;
begin
  Document := WebBrowser1.Document as IHtmlDocument2;
  if (Assigned(Document)) then begin
      v := VarArrayCreate([0, 0], varVariant);
      v[0] := HTML;
      Document.Write(PSafeArray(TVarData(v).VArray));
      Document.Close;
  end;
end;
 
procedure TWebForm.ShowHtmlButtonClick(Sender: TObject);
{this routine shows the html-code of the body of the document  wich is active in the WebBrowser. }
var
  Document: IHtmlDocument2;
  HtmlStr:string;
begin
  Document := WebBrowser1.Document as IHtmlDocument2;
  if (Assigned(Document)) then begin
      HtmlStr := Document.Get_body.Get_outerHTML;
      Document.Close;
  end;
  ShowMessage(HtmlStr);
end;
 
The IHtmlDocument2 interface has a lot of properties/methods to get/put the content of the active document, using DOM. You have to include SHDocVw_TLB, MSHTML_TLB, Activex in the uses of the form.
You should maybe initiate the webbrowser with a dummy html-page before you try to write/get any information from the document.
Obs: It seems like the IHtmlDocument2 interface redeclares some of the datatypes in Delphi, so maybe you have to typecast some variables (to integer/boolean) to make it work properly...
 
You can get some more information here:
http://msdn.microsoft.com/workshop//browser/mshtml/reference/ifaces/interface.asp
http://www.egroups.com/group/delphi-dhtmledit/
http://www.egroups.com/group/delphi-webbrowser/
the latter newsgroup deals specifically with using the TWebbrowser.
good luck
Regards Barry
ASKER CERTIFIED SOLUTION
Avatar of inthe
inthe

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Barry,
Thank you kindly for your help.
All of your advice was very helpful.
I had troubld with IHtmlDocument2 & don't know why.
Changed the Document variable declaration to type Variant
and this worked.  Below is the code I have to show the text of the Document when I show a form.

procedure TfrmCaptureData.FormShow(Sender: TObject);
var
  Document : Variant;
begin
  document := MainForm.webbrowser1.document;
  Memo2.lines.Clear;
  memo2.lines.add(trim(document.body.innertext));
end;

I am very grateful for your help.
Regards,
Tony
Barry - Just thought I'd give this a og... Do you know if this only works in D5? I have used just this code in D4 and the document var is not assigned when I do this:

var
  Document : Variant;
....
  document := webbrowser.document;

Cheers,

Raymond.
Raymond,
Yes, this does work in D5.  I am doing it this way as a consequence of Barry's advice and a problem with the following:
 
var
Document: IHtmlDocument2;
....
document := webbrowser1.document;

D5 did not like IHtmlDocument2 and so I thought I would try changing the declaration to Variant and it worked.

Regards,
Tony
Hi all,
im sorry i thought i added a comment here about this must have closed browser before it was properly posted so i post again to clear it up ;-)
it works fine in d4 the secret is add MSHTML_TLB to the uses section.if you dont have MSHTML_TLB.pas in imports directory then import the activex component DHtmlEdit.
then using the doc.body.() you can get many things from the browser html,text innertext etc...then a d4 example is like :
 
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,StdCtrls, OleCtrls,SHDocVw_TLB,MSHTML_TLB;

type
  TForm1 = class(TForm)
    Button1: TButton;
    WebBrowser1: TWebBrowser;
    Memo1: TMemo;
    Button2: TButton;
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.DFM}

procedure TForm1.Button1Click(Sender: TObject);
var
Doc :  IHtmlDocument2 ;
begin
Doc := Webbrowser1.Document as IHtmlDocument2;
Memo1.Clear;
Memo1.lines.add(trim(doc.body.innerText));

end;

procedure TForm1.Button2Click(Sender: TObject);
begin
Webbrowser1.OleObject.Navigate('www.experts-exchange.com');
end;

end.


sorry for not posting that properly the other day and hope it helped
Regards Barry
note:
 adding MSHTML_TLB  should also then work for d5.
cheers .
Barry:OK - I have it working in D4 now...

One more thing (I'll flick you some points...), how do you get ALL the HTML, rather than just the BODY HTML code...

I've have had a wee browse through the MSHTML_TLB file (wow, is it big!) and haven't seen anything obvious (except for all, which is a list of elements which is non-obvious as to its use...)

Cheers,

Raymond.
Hi Ray,
add activeX to the uses section the try this:


unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,StdCtrls, OleCtrls, SHDocVw_TLB,ActiveX;

type
  TForm1 = class(TForm)
    Button1: TButton;
    Button2: TButton;
    WebBrowser1: TWebBrowser;
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

const
  SOURCE = 2;

implementation

{$R *.DFM}

procedure TForm1.Button1Click(Sender: TObject);
Begin
Webbrowser1.OleObject.Navigate('www.experts-exchange.com');
end;

procedure TForm1.Button2Click(Sender: TObject);
const
  CGID_WebBrowser: TGUID = '{ED016940-BD5B-11cf-BA4E-00C04FD70816}';
var
  CmdTarget : IOleCommandTarget;
  vaIn, vaOut : OleVariant;
  PtrGUID : PGUID;
 begin
  New(PtrGUID);
  PtrGUID^ := CGID_WebBrowser;
  if WebBrowser1.Document <> nil then
    try
      WebBrowser1.Document.QueryInterface(IOleCommandTarget, CmdTarget);
      if CmdTarget <> nil then
        try
          CmdTarget.Exec( PtrGUID, SOURCE, 0, vaIn, vaOut);
        finally
          CmdTarget._Release;
        end;
    except
      // Nothing
    end;
  Dispose(PtrGUID);

end;
 end.

yes it just a call to "viewsource" but havent worked this into a memo yet (god knows i tried ;-)
ya can cheat and copy the text from notepad window to the memo but it looks bad, i'll have another look friday(ish) when get some free time .
Regards Barry
change Source constant to a 1 for "find" dialog.
Barry: Hmm.. Really wanted it to give it back to me like body.innerhtml (amazing that it doesn't appear to do this. How typically M$)

I want this for SiteSplicer, my Web Site creation tool. It has a TWebBrowser (I know, using MS stuff again :-( ) showing a view of the current page. There is a memo showing the actual HTML for that page which gets filled in in the OnDocumentDone (sp?) event. When I've got this, and a few other bits and bobs done, it'll be ready for its first  venture out into the wild...

Looking forward to Friday...

Cheers,

Raymond.
ahh of couse i remember using this about 50 times ,you have to save it to file,yes this is strange but even ms frontpage 2000 has to also,if you look in temp directory when using fromtpage it saves many files there.
you save it using ipersistfile(activeX)this is same as when using save or saveas on a custom right-click popup menu ..


procedure TForm1.Button2Click(Sender: TObject);
var
Persist : IPersistFile;
Document : IHTMLDocument2;
b : longbool;
begin
Document :=Webbrowser1.Document as IHTMLDocument2;
Persist := Document as IPersistFile;
Persist.Save('C:\temp.htm',b);
memo1.lines.loadfromfile('C:\temp.htm');
end;

Regards Barry
ps. scotland on sunday eh...surely be another win  ;-)
Barry: That works just fine. Had my first run-in with PWideChars tring to give the IPersist.save a filename in a variable...

I've posted a Q with some points for you...

Cheers,

Raymond.