• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 311
  • Last Modified:

look for words in TWebBrowser when completed download

I need to look for specific words in a TWebBrowser when it has completed it's download of a website. I need to wait for the component to finished downloading all the frames from the website before i start looking for specific words in the web source for any and all frames.

The key is that the component must triger this when it is in a rest state after all frames has been downloaded.
0
Code2009
Asked:
Code2009
  • 3
  • 2
  • 2
1 Solution
 
irishbuddhaCommented:
Something along the lines of the following should do the trick:


   oToBrowser.Navigate('my web doc');
   while oToBrowser.ReadyState < READYSTATE_INTERACTIVE do
      begin
         Application.ProcessMessages;
      end;

Open in new window

0
 
irishbuddhaCommented:
Sorry, left out the portion for digging into the actual document's HTML. The following will give you the full  document in a simple string, which you can then parse/search or perform whatever action you are after.
function GetHTMLFromBrowser(oFromBrowser: TWebBrowser): string;
var iMyHTML : IHTMLElement;
begin
   result := '';
   if Assigned(oFromBrowser.Document) then
      begin
         iMyHTML := (oFromBrowser.Document AS IHTMLDocument2).body;
         //now, back up to the parent/full document
         while iMyHTML.parentElement <> nil do
            begin
               iMyHTML := iMyHTML.parentElement;
            end;
         result := iMyHTML.outerHTML;
      end;
end;

Open in new window

0
 
Eddie ShipmanAll-around developerCommented:
Try something like this. This highlights all the keywords from Edit1.Text when you click on the Find button.
Be aware that MSHTML.DLL has a bug that will cause a 800A0025E exception if the element you are trying to select is hidden. I believe I've captured the exception but if you try dbl-clicking on an item in the Listbox and it doesn't scroll into view, it is one of the ones that was hidden.
unit Unit1;
 
interface
 
uses
  Windows, SysUtils, Forms, Graphics, Controls, Dialogs, ComCtrls, ExtCtrls, Classes,
  OleCtrls, SHDocVw, StdCtrls, MSHTML;
 
type
  TForm1 = class(TForm)
    Button1: TButton;
    Edit1: TEdit;
    Button2: TButton;
    Edit2: TEdit;
    ListBox1: TListBox;
    Panel1: TPanel;
    WB: TWebBrowser;
    procedure FormCreate(Sender: TObject);
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
    procedure ListBox1DblClick(Sender: TObject);
    procedure WBDocumentComplete(Sender: TObject; const pDisp: IDispatch;
      var URL: OleVariant);
  private
    { Private declarations }
  public
    { Public declarations }
    TextRange: IHTMLTxtRange;
    ilist:     TInterfaceList;
    procedure WBLocateHighlight(WB: TWebBrowser; Text: string);
  end;
 
var
  Form1: TForm1;
 
implementation
 
{$R *.dfm}
 
procedure TForm1.FormCreate(Sender: TObject);
begin
  WB.Navigate('about:blank');
  ilist:=TInterfaceList.Create;
end;
 
procedure TForm1.WBLocateHighlight(WB: TWebBrowser; Text: string);
const
   prefix = '<span style="color:white; background-color: red;">';
   suffix = '</span>';
var
   tr: IHTMLTxtRange;
begin
   if Assigned(WB.Document) then
   begin
     tr := ((wb.Document AS IHTMLDocument2).body AS IHTMLBodyElement).createTextRange;
     while tr.findText(Text, 1, 0) do
     begin
       // this try..except..finally block keep us from getting the 800A0025E error
       // that occurs due to a bug in MSHTML.DLL when the element is hidden
       try try
       tr.select;
       except
       end;
       finally
         ilist.Add(tr.parentElement);
         ListBox1.Items.Add(tr.Text);
         tr.pasteHTML(prefix + tr.htmlText + suffix);
         tr.scrollIntoView(True);
       end;
     end;
   end;
end;
 
procedure TForm1.Button1Click(Sender: TObject);
begin
  ListBox1.Clear;
  WBLocateHighlight(WB, Edit1.Text);
  TextRange := ((WB.Document as IHTMLDocument2).Body As IHTMLBodyElement).CreateTextRange;
end;
 
procedure TForm1.Button2Click(Sender: TObject);
begin
  WB.navigate(Edit2.Text);
end;
 
procedure TForm1.FormDestroy(Sender: TObject);
begin
  ilist.Free;
end;
 
procedure TForm1.ListBox1DblClick(Sender: TObject);
var
  pe: IHTMLElement;
begin
  pe := (ilist[ListBox1.ItemIndex] as IHTMLElement);
  if pe <> nil then
  begin
    TextRange.moveToElementText(pe);
    TextRange.findText(ListBox1.Items[ListBox1.ItemIndex], 1, 0);
    // this try..except..finally block keep us from getting the 800A0025E error
    // that occurs due to a bug in MSHTML.DLL when the element is hidden
    try try
      TextRange.select;
    except
    end;
    finally
      TextRange.scrollIntoView(True);
    end;
  end;
end;
 
procedure TForm1.WBDocumentComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
begin
  Button1.Enabled := True;
  TextRange := ((WB.Document as IHTMLDocument2).Body As IHTMLBodyElement).CreateTextRange;
end;
 
end.
 
{DFM}
object Form1: TForm1
  Left = 304
  Top = 193
  Width = 779
  Height = 652
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  OnDestroy = FormDestroy
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 8
    Top = 64
    Width = 75
    Height = 25
    Caption = 'Find'
    TabOrder = 0
    OnClick = Button1Click
  end
  object Edit1: TEdit
    Left = 104
    Top = 64
    Width = 121
    Height = 21
    TabOrder = 1
    Text = 'expert'
  end
  object Button2: TButton
    Left = 8
    Top = 24
    Width = 75
    Height = 25
    Caption = 'Navigate'
    TabOrder = 2
    OnClick = Button2Click
  end
  object Edit2: TEdit
    Left = 104
    Top = 24
    Width = 193
    Height = 21
    TabOrder = 3
    Text = 'http://www.experts-exchange.com'
  end
  object ListBox1: TListBox
    Left = 32
    Top = 112
    Width = 185
    Height = 433
    ItemHeight = 13
    TabOrder = 4
    OnDblClick = ListBox1DblClick
  end
  object Panel1: TPanel
    Left = 304
    Top = 0
    Width = 467
    Height = 625
    Align = alRight
    Anchors = [akLeft, akTop, akRight, akBottom]
    BevelOuter = bvNone
    Caption = 'Panel1'
    TabOrder = 5
    object WB: TWebBrowser
      Left = 0
      Top = 0
      Width = 467
      Height = 625
      Align = alClient
      TabOrder = 0
      OnDocumentComplete = WBDocumentComplete
      ControlData = {
        4C00000044300000984000000000000000000000000000000000000000000000
        000000004C000000000000000000000001000000E0D057007335CF11AE690800
        2B2E126208000000000000004C0000000114020000000000C000000000000046
        8000000000000000000000000000000000000000000000000000000000000000
        00000000000000000100000000000000000000000000000000000000}
    end
  end
end

Open in new window

0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
Eddie ShipmanAll-around developerCommented:
OH, BTW, I'm not sure if it will work with frames/IFrames and I don't think the other code will either because you can't get the source to any cross-domain frames/iframes.
0
 
Code2009Author Commented:
Thanks so far ... what if i want the word or sentance next to the word that i found?

If i am looking for the word name in the source code and it is found ... i would like to read the word or sentance next to that word. I will give double points for this one ... can i even do that?
0
 
irishbuddhaCommented:
You can accomplish that several different ways, one of which is a simple Copy() as seen below. Beyond this, you could parse the document and pull out what you are after, but you'll need to determine a few bits of logic that identify where the next 'sentence' or string you want to extract ends so that you can extract the correct piece :


procedure TForm1.BitBtn1Click(Sender: TObject);
begin
   //
   ShowMessage('Found: "' + ExtractMyString('<html><body><h1>Simple String Extraction</h1></body></html>','<h1>','</h1>') + '"');
end;
 
function TForm1.ExtractMyString(cSource           : string;
                                cExtractStartText : string;
                                cExtractToText    : string): string;
var nStartPos : integer;
    nStopPos : integer;
begin
   result := '';
   //first, identify where you want to stop from
   //Pos() will give you the startingt position of your cExtractToText string within cSource
   //For the starting position, we'll find cExtractStartText and then start at the end of that text
   nStartPos := Pos(cExtractStartText,cSource) + Length(cExtractStartText);
   //Pos() is Case-Sensitive, just a heads up
   //if you want it to be case-insensitive for locating the string:
   //   Pos(UpperCase(cExtractStartText),UpperCase(cSource))
   //for our ending position, just find the Pos of it within our cSource string
   nStopPos  := Pos(cExtractToText,cSource);
   //next, copy out the string that is in the middle of the Start/End pieces you passed in
   result := Copy(cSource,              //source to copy from
                  nStartPos,            //starting position
                  nStopPos - nStartPos);//how many characters to copy
end;

Open in new window

0
 
Code2009Author Commented:
Thank you. You are a star :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now