anthonyfaunt
asked on
Accessing Text in a TWebBrowser Document
I am using a TWebBrowser control in Delphi 5 Enterprise.
Once I have navigated to a Web site, How can I access the text in the downloaded HTML document?
Win32 Developer's Reference Help advises that:
For more information about this interface, see the document object section of Microsoft’s Dynamic HTML reference.
I have not had any luck following this suggestion.
Once I have navigated to a Web site, How can I access the text in the downloaded HTML document?
Win32 Developer's Reference Help advises that:
For more information about this interface, see the document object section of Microsoft’s Dynamic HTML reference.
I have not had any luck following this suggestion.
listening...
ASKER
Hi rwilson, do you want more info?
ASKER
Edited text of question.
Listening...
Generally means that the expert is interrested in following the question, but does not have much to contribute at this point in time.
Unfortunately not many of us have D5 enterprise.. so I for one can't help I'm afraid.
Rob ;-)
Generally means that the expert is interrested in following the question, but does not have much to contribute at this point in time.
Unfortunately not many of us have D5 enterprise.. so I for one can't help I'm afraid.
Rob ;-)
Following...®
Hi
what do you want to do with the text?
Regards Barry
what do you want to do with the text?
Regards Barry
ASKER
Hi Barry
I want to extract data from the text. I am presently doing this by saving the HTML Source as a text file and then processing the text file to extract the HTML formatting code and leave the data that the page displays.
By being able to access the text in the Document (property of the TWebBrowser control) I will be able to extract the data more seamlessly.
Regards,
Tony
I want to extract data from the text. I am presently doing this by saving the HTML Source as a text file and then processing the text file to extract the HTML formatting code and leave the data that the page displays.
By being able to access the text in the Document (property of the TWebBrowser control) I will be able to extract the data more seamlessly.
Regards,
Tony
Hi
Maybe these examples will give you some hints on using the Dom ...
procedure TWebForm.DumpHtmlToBrowser (HTML: String);
{this routine writes a html-string directly to the WebBrowser}
var
Document: IHtmlDocument2;
v:Variant;
begin
Document := WebBrowser1.Document as IHtmlDocument2;
if (Assigned(Document)) then begin
v := VarArrayCreate([0, 0], varVariant);
v[0] := HTML;
Document.Write(PSafeArray( TVarData(v ).VArray)) ;
Document.Close;
end;
end;
procedure TWebForm.ShowHtmlButtonCli ck(Sender: TObject);
{this routine shows the html-code of the body of the document wich is active in the WebBrowser. }
var
Document: IHtmlDocument2;
HtmlStr:string;
begin
Document := WebBrowser1.Document as IHtmlDocument2;
if (Assigned(Document)) then begin
HtmlStr := Document.Get_body.Get_oute rHTML;
Document.Close;
end;
ShowMessage(HtmlStr);
end;
The IHtmlDocument2 interface has a lot of properties/methods to get/put the content of the active document, using DOM. You have to include SHDocVw_TLB, MSHTML_TLB, Activex in the uses of the form.
You should maybe initiate the webbrowser with a dummy html-page before you try to write/get any information from the document.
Obs: It seems like the IHtmlDocument2 interface redeclares some of the datatypes in Delphi, so maybe you have to typecast some variables (to integer/boolean) to make it work properly...
You can get some more information here:
http://msdn.microsoft.com/workshop//browser/mshtml/reference/ifaces/interface.asp
http://www.egroups.com/group/delphi-dhtmledit/
http://www.egroups.com/group/delphi-webbrowser/
the latter newsgroup deals specifically with using the TWebbrowser.
good luck
Regards Barry
Maybe these examples will give you some hints on using the Dom ...
procedure TWebForm.DumpHtmlToBrowser
{this routine writes a html-string directly to the WebBrowser}
var
Document: IHtmlDocument2;
v:Variant;
begin
Document := WebBrowser1.Document as IHtmlDocument2;
if (Assigned(Document)) then begin
v := VarArrayCreate([0, 0], varVariant);
v[0] := HTML;
Document.Write(PSafeArray(
Document.Close;
end;
end;
procedure TWebForm.ShowHtmlButtonCli
{this routine shows the html-code of the body of the document wich is active in the WebBrowser. }
var
Document: IHtmlDocument2;
HtmlStr:string;
begin
Document := WebBrowser1.Document as IHtmlDocument2;
if (Assigned(Document)) then begin
HtmlStr := Document.Get_body.Get_oute
Document.Close;
end;
ShowMessage(HtmlStr);
end;
The IHtmlDocument2 interface has a lot of properties/methods to get/put the content of the active document, using DOM. You have to include SHDocVw_TLB, MSHTML_TLB, Activex in the uses of the form.
You should maybe initiate the webbrowser with a dummy html-page before you try to write/get any information from the document.
Obs: It seems like the IHtmlDocument2 interface redeclares some of the datatypes in Delphi, so maybe you have to typecast some variables (to integer/boolean) to make it work properly...
You can get some more information here:
http://msdn.microsoft.com/workshop//browser/mshtml/reference/ifaces/interface.asp
http://www.egroups.com/group/delphi-dhtmledit/
http://www.egroups.com/group/delphi-webbrowser/
the latter newsgroup deals specifically with using the TWebbrowser.
good luck
Regards Barry
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi Barry,
Thank you kindly for your help.
All of your advice was very helpful.
I had troubld with IHtmlDocument2 & don't know why.
Changed the Document variable declaration to type Variant
and this worked. Below is the code I have to show the text of the Document when I show a form.
procedure TfrmCaptureData.FormShow(S ender: TObject);
var
Document : Variant;
begin
document := MainForm.webbrowser1.docum ent;
Memo2.lines.Clear;
memo2.lines.add(trim(docum ent.body.i nnertext)) ;
end;
I am very grateful for your help.
Regards,
Tony
Thank you kindly for your help.
All of your advice was very helpful.
I had troubld with IHtmlDocument2 & don't know why.
Changed the Document variable declaration to type Variant
and this worked. Below is the code I have to show the text of the Document when I show a form.
procedure TfrmCaptureData.FormShow(S
var
Document : Variant;
begin
document := MainForm.webbrowser1.docum
Memo2.lines.Clear;
memo2.lines.add(trim(docum
end;
I am very grateful for your help.
Regards,
Tony
Barry - Just thought I'd give this a og... Do you know if this only works in D5? I have used just this code in D4 and the document var is not assigned when I do this:
var
Document : Variant;
....
document := webbrowser.document;
Cheers,
Raymond.
var
Document : Variant;
....
document := webbrowser.document;
Cheers,
Raymond.
ASKER
Raymond,
Yes, this does work in D5. I am doing it this way as a consequence of Barry's advice and a problem with the following:
var
Document: IHtmlDocument2;
....
document := webbrowser1.document;
D5 did not like IHtmlDocument2 and so I thought I would try changing the declaration to Variant and it worked.
Regards,
Tony
Yes, this does work in D5. I am doing it this way as a consequence of Barry's advice and a problem with the following:
var
Document: IHtmlDocument2;
....
document := webbrowser1.document;
D5 did not like IHtmlDocument2 and so I thought I would try changing the declaration to Variant and it worked.
Regards,
Tony
Hi all,
im sorry i thought i added a comment here about this must have closed browser before it was properly posted so i post again to clear it up ;-)
it works fine in d4 the secret is add MSHTML_TLB to the uses section.if you dont have MSHTML_TLB.pas in imports directory then import the activex component DHtmlEdit.
then using the doc.body.() you can get many things from the browser html,text innertext etc...then a d4 example is like :
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,StdCtrls, OleCtrls,SHDocVw_TLB,MSHTM L_TLB;
type
TForm1 = class(TForm)
Button1: TButton;
WebBrowser1: TWebBrowser;
Memo1: TMemo;
Button2: TButton;
procedure Button1Click(Sender: TObject);
procedure Button2Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.DFM}
procedure TForm1.Button1Click(Sender : TObject);
var
Doc : IHtmlDocument2 ;
begin
Doc := Webbrowser1.Document as IHtmlDocument2;
Memo1.Clear;
Memo1.lines.add(trim(doc.b ody.innerT ext));
end;
procedure TForm1.Button2Click(Sender : TObject);
begin
Webbrowser1.OleObject.Navi gate('www.experts-exchange.com');
end;
end.
sorry for not posting that properly the other day and hope it helped
Regards Barry
im sorry i thought i added a comment here about this must have closed browser before it was properly posted so i post again to clear it up ;-)
it works fine in d4 the secret is add MSHTML_TLB to the uses section.if you dont have MSHTML_TLB.pas in imports directory then import the activex component DHtmlEdit.
then using the doc.body.() you can get many things from the browser html,text innertext etc...then a d4 example is like :
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,StdCtrls, OleCtrls,SHDocVw_TLB,MSHTM
type
TForm1 = class(TForm)
Button1: TButton;
WebBrowser1: TWebBrowser;
Memo1: TMemo;
Button2: TButton;
procedure Button1Click(Sender: TObject);
procedure Button2Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.DFM}
procedure TForm1.Button1Click(Sender
var
Doc : IHtmlDocument2 ;
begin
Doc := Webbrowser1.Document as IHtmlDocument2;
Memo1.Clear;
Memo1.lines.add(trim(doc.b
end;
procedure TForm1.Button2Click(Sender
begin
Webbrowser1.OleObject.Navi
end;
end.
sorry for not posting that properly the other day and hope it helped
Regards Barry
note:
adding MSHTML_TLB should also then work for d5.
cheers .
adding MSHTML_TLB should also then work for d5.
cheers .
Barry:OK - I have it working in D4 now...
One more thing (I'll flick you some points...), how do you get ALL the HTML, rather than just the BODY HTML code...
I've have had a wee browse through the MSHTML_TLB file (wow, is it big!) and haven't seen anything obvious (except for all, which is a list of elements which is non-obvious as to its use...)
Cheers,
Raymond.
One more thing (I'll flick you some points...), how do you get ALL the HTML, rather than just the BODY HTML code...
I've have had a wee browse through the MSHTML_TLB file (wow, is it big!) and haven't seen anything obvious (except for all, which is a list of elements which is non-obvious as to its use...)
Cheers,
Raymond.
Hi Ray,
add activeX to the uses section the try this:
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,StdCtrls, OleCtrls, SHDocVw_TLB,ActiveX;
type
TForm1 = class(TForm)
Button1: TButton;
Button2: TButton;
WebBrowser1: TWebBrowser;
procedure Button1Click(Sender: TObject);
procedure Button2Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
const
SOURCE = 2;
implementation
{$R *.DFM}
procedure TForm1.Button1Click(Sender : TObject);
Begin
Webbrowser1.OleObject.Navi gate('www.experts-exchange.com');
end;
procedure TForm1.Button2Click(Sender : TObject);
const
CGID_WebBrowser: TGUID = '{ED016940-BD5B-11cf-BA4E- 00C04FD708 16}';
var
CmdTarget : IOleCommandTarget;
vaIn, vaOut : OleVariant;
PtrGUID : PGUID;
begin
New(PtrGUID);
PtrGUID^ := CGID_WebBrowser;
if WebBrowser1.Document <> nil then
try
WebBrowser1.Document.Query Interface( IOleComman dTarget, CmdTarget);
if CmdTarget <> nil then
try
CmdTarget.Exec( PtrGUID, SOURCE, 0, vaIn, vaOut);
finally
CmdTarget._Release;
end;
except
// Nothing
end;
Dispose(PtrGUID);
end;
end.
yes it just a call to "viewsource" but havent worked this into a memo yet (god knows i tried ;-)
ya can cheat and copy the text from notepad window to the memo but it looks bad, i'll have another look friday(ish) when get some free time .
Regards Barry
add activeX to the uses section the try this:
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,StdCtrls, OleCtrls, SHDocVw_TLB,ActiveX;
type
TForm1 = class(TForm)
Button1: TButton;
Button2: TButton;
WebBrowser1: TWebBrowser;
procedure Button1Click(Sender: TObject);
procedure Button2Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
const
SOURCE = 2;
implementation
{$R *.DFM}
procedure TForm1.Button1Click(Sender
Begin
Webbrowser1.OleObject.Navi
end;
procedure TForm1.Button2Click(Sender
const
CGID_WebBrowser: TGUID = '{ED016940-BD5B-11cf-BA4E-
var
CmdTarget : IOleCommandTarget;
vaIn, vaOut : OleVariant;
PtrGUID : PGUID;
begin
New(PtrGUID);
PtrGUID^ := CGID_WebBrowser;
if WebBrowser1.Document <> nil then
try
WebBrowser1.Document.Query
if CmdTarget <> nil then
try
CmdTarget.Exec( PtrGUID, SOURCE, 0, vaIn, vaOut);
finally
CmdTarget._Release;
end;
except
// Nothing
end;
Dispose(PtrGUID);
end;
end.
yes it just a call to "viewsource" but havent worked this into a memo yet (god knows i tried ;-)
ya can cheat and copy the text from notepad window to the memo but it looks bad, i'll have another look friday(ish) when get some free time .
Regards Barry
change Source constant to a 1 for "find" dialog.
Barry: Hmm.. Really wanted it to give it back to me like body.innerhtml (amazing that it doesn't appear to do this. How typically M$)
I want this for SiteSplicer, my Web Site creation tool. It has a TWebBrowser (I know, using MS stuff again :-( ) showing a view of the current page. There is a memo showing the actual HTML for that page which gets filled in in the OnDocumentDone (sp?) event. When I've got this, and a few other bits and bobs done, it'll be ready for its first venture out into the wild...
Looking forward to Friday...
Cheers,
Raymond.
I want this for SiteSplicer, my Web Site creation tool. It has a TWebBrowser (I know, using MS stuff again :-( ) showing a view of the current page. There is a memo showing the actual HTML for that page which gets filled in in the OnDocumentDone (sp?) event. When I've got this, and a few other bits and bobs done, it'll be ready for its first venture out into the wild...
Looking forward to Friday...
Cheers,
Raymond.
ahh of couse i remember using this about 50 times ,you have to save it to file,yes this is strange but even ms frontpage 2000 has to also,if you look in temp directory when using fromtpage it saves many files there.
you save it using ipersistfile(activeX)this is same as when using save or saveas on a custom right-click popup menu ..
procedure TForm1.Button2Click(Sender : TObject);
var
Persist : IPersistFile;
Document : IHTMLDocument2;
b : longbool;
begin
Document :=Webbrowser1.Document as IHTMLDocument2;
Persist := Document as IPersistFile;
Persist.Save('C:\temp.htm' ,b);
memo1.lines.loadfromfile(' C:\temp.ht m');
end;
Regards Barry
ps. scotland on sunday eh...surely be another win ;-)
you save it using ipersistfile(activeX)this is same as when using save or saveas on a custom right-click popup menu ..
procedure TForm1.Button2Click(Sender
var
Persist : IPersistFile;
Document : IHTMLDocument2;
b : longbool;
begin
Document :=Webbrowser1.Document as IHTMLDocument2;
Persist := Document as IPersistFile;
Persist.Save('C:\temp.htm'
memo1.lines.loadfromfile('
end;
Regards Barry
ps. scotland on sunday eh...surely be another win ;-)
Barry: That works just fine. Had my first run-in with PWideChars tring to give the IPersist.save a filename in a variable...
I've posted a Q with some points for you...
Cheers,
Raymond.
I've posted a Q with some points for you...
Cheers,
Raymond.