Solved

look for words in TWebBrowser when completed download

Posted on 2009-05-19
7
270 Views
Last Modified: 2012-05-07
I need to look for specific words in a TWebBrowser when it has completed it's download of a website. I need to wait for the component to finished downloading all the frames from the website before i start looking for specific words in the web source for any and all frames.

The key is that the component must triger this when it is in a rest state after all frames has been downloaded.
0
Comment
Question by:Code2009
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
7 Comments
 
LVL 4

Expert Comment

by:irishbuddha
ID: 24423657
Something along the lines of the following should do the trick:


   oToBrowser.Navigate('my web doc');
   while oToBrowser.ReadyState < READYSTATE_INTERACTIVE do
      begin
         Application.ProcessMessages;
      end;

Open in new window

0
 
LVL 4

Expert Comment

by:irishbuddha
ID: 24423716
Sorry, left out the portion for digging into the actual document's HTML. The following will give you the full  document in a simple string, which you can then parse/search or perform whatever action you are after.
function GetHTMLFromBrowser(oFromBrowser: TWebBrowser): string;
var iMyHTML : IHTMLElement;
begin
   result := '';
   if Assigned(oFromBrowser.Document) then
      begin
         iMyHTML := (oFromBrowser.Document AS IHTMLDocument2).body;
         //now, back up to the parent/full document
         while iMyHTML.parentElement <> nil do
            begin
               iMyHTML := iMyHTML.parentElement;
            end;
         result := iMyHTML.outerHTML;
      end;
end;

Open in new window

0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 24426359
Try something like this. This highlights all the keywords from Edit1.Text when you click on the Find button.
Be aware that MSHTML.DLL has a bug that will cause a 800A0025E exception if the element you are trying to select is hidden. I believe I've captured the exception but if you try dbl-clicking on an item in the Listbox and it doesn't scroll into view, it is one of the ones that was hidden.
unit Unit1;
 
interface
 
uses
  Windows, SysUtils, Forms, Graphics, Controls, Dialogs, ComCtrls, ExtCtrls, Classes,
  OleCtrls, SHDocVw, StdCtrls, MSHTML;
 
type
  TForm1 = class(TForm)
    Button1: TButton;
    Edit1: TEdit;
    Button2: TButton;
    Edit2: TEdit;
    ListBox1: TListBox;
    Panel1: TPanel;
    WB: TWebBrowser;
    procedure FormCreate(Sender: TObject);
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
    procedure ListBox1DblClick(Sender: TObject);
    procedure WBDocumentComplete(Sender: TObject; const pDisp: IDispatch;
      var URL: OleVariant);
  private
    { Private declarations }
  public
    { Public declarations }
    TextRange: IHTMLTxtRange;
    ilist:     TInterfaceList;
    procedure WBLocateHighlight(WB: TWebBrowser; Text: string);
  end;
 
var
  Form1: TForm1;
 
implementation
 
{$R *.dfm}
 
procedure TForm1.FormCreate(Sender: TObject);
begin
  WB.Navigate('about:blank');
  ilist:=TInterfaceList.Create;
end;
 
procedure TForm1.WBLocateHighlight(WB: TWebBrowser; Text: string);
const
   prefix = '<span style="color:white; background-color: red;">';
   suffix = '</span>';
var
   tr: IHTMLTxtRange;
begin
   if Assigned(WB.Document) then
   begin
     tr := ((wb.Document AS IHTMLDocument2).body AS IHTMLBodyElement).createTextRange;
     while tr.findText(Text, 1, 0) do
     begin
       // this try..except..finally block keep us from getting the 800A0025E error
       // that occurs due to a bug in MSHTML.DLL when the element is hidden
       try try
       tr.select;
       except
       end;
       finally
         ilist.Add(tr.parentElement);
         ListBox1.Items.Add(tr.Text);
         tr.pasteHTML(prefix + tr.htmlText + suffix);
         tr.scrollIntoView(True);
       end;
     end;
   end;
end;
 
procedure TForm1.Button1Click(Sender: TObject);
begin
  ListBox1.Clear;
  WBLocateHighlight(WB, Edit1.Text);
  TextRange := ((WB.Document as IHTMLDocument2).Body As IHTMLBodyElement).CreateTextRange;
end;
 
procedure TForm1.Button2Click(Sender: TObject);
begin
  WB.navigate(Edit2.Text);
end;
 
procedure TForm1.FormDestroy(Sender: TObject);
begin
  ilist.Free;
end;
 
procedure TForm1.ListBox1DblClick(Sender: TObject);
var
  pe: IHTMLElement;
begin
  pe := (ilist[ListBox1.ItemIndex] as IHTMLElement);
  if pe <> nil then
  begin
    TextRange.moveToElementText(pe);
    TextRange.findText(ListBox1.Items[ListBox1.ItemIndex], 1, 0);
    // this try..except..finally block keep us from getting the 800A0025E error
    // that occurs due to a bug in MSHTML.DLL when the element is hidden
    try try
      TextRange.select;
    except
    end;
    finally
      TextRange.scrollIntoView(True);
    end;
  end;
end;
 
procedure TForm1.WBDocumentComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
begin
  Button1.Enabled := True;
  TextRange := ((WB.Document as IHTMLDocument2).Body As IHTMLBodyElement).CreateTextRange;
end;
 
end.
 
{DFM}
object Form1: TForm1
  Left = 304
  Top = 193
  Width = 779
  Height = 652
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  OnDestroy = FormDestroy
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 8
    Top = 64
    Width = 75
    Height = 25
    Caption = 'Find'
    TabOrder = 0
    OnClick = Button1Click
  end
  object Edit1: TEdit
    Left = 104
    Top = 64
    Width = 121
    Height = 21
    TabOrder = 1
    Text = 'expert'
  end
  object Button2: TButton
    Left = 8
    Top = 24
    Width = 75
    Height = 25
    Caption = 'Navigate'
    TabOrder = 2
    OnClick = Button2Click
  end
  object Edit2: TEdit
    Left = 104
    Top = 24
    Width = 193
    Height = 21
    TabOrder = 3
    Text = 'http://www.experts-exchange.com'
  end
  object ListBox1: TListBox
    Left = 32
    Top = 112
    Width = 185
    Height = 433
    ItemHeight = 13
    TabOrder = 4
    OnDblClick = ListBox1DblClick
  end
  object Panel1: TPanel
    Left = 304
    Top = 0
    Width = 467
    Height = 625
    Align = alRight
    Anchors = [akLeft, akTop, akRight, akBottom]
    BevelOuter = bvNone
    Caption = 'Panel1'
    TabOrder = 5
    object WB: TWebBrowser
      Left = 0
      Top = 0
      Width = 467
      Height = 625
      Align = alClient
      TabOrder = 0
      OnDocumentComplete = WBDocumentComplete
      ControlData = {
        4C00000044300000984000000000000000000000000000000000000000000000
        000000004C000000000000000000000001000000E0D057007335CF11AE690800
        2B2E126208000000000000004C0000000114020000000000C000000000000046
        8000000000000000000000000000000000000000000000000000000000000000
        00000000000000000100000000000000000000000000000000000000}
    end
  end
end

Open in new window

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 26

Expert Comment

by:EddieShipman
ID: 24426377
OH, BTW, I'm not sure if it will work with frames/IFrames and I don't think the other code will either because you can't get the source to any cross-domain frames/iframes.
0
 

Author Comment

by:Code2009
ID: 24431607
Thanks so far ... what if i want the word or sentance next to the word that i found?

If i am looking for the word name in the source code and it is found ... i would like to read the word or sentance next to that word. I will give double points for this one ... can i even do that?
0
 
LVL 4

Accepted Solution

by:
irishbuddha earned 500 total points
ID: 24431970
You can accomplish that several different ways, one of which is a simple Copy() as seen below. Beyond this, you could parse the document and pull out what you are after, but you'll need to determine a few bits of logic that identify where the next 'sentence' or string you want to extract ends so that you can extract the correct piece :


procedure TForm1.BitBtn1Click(Sender: TObject);
begin
   //
   ShowMessage('Found: "' + ExtractMyString('<html><body><h1>Simple String Extraction</h1></body></html>','<h1>','</h1>') + '"');
end;
 
function TForm1.ExtractMyString(cSource           : string;
                                cExtractStartText : string;
                                cExtractToText    : string): string;
var nStartPos : integer;
    nStopPos : integer;
begin
   result := '';
   //first, identify where you want to stop from
   //Pos() will give you the startingt position of your cExtractToText string within cSource
   //For the starting position, we'll find cExtractStartText and then start at the end of that text
   nStartPos := Pos(cExtractStartText,cSource) + Length(cExtractStartText);
   //Pos() is Case-Sensitive, just a heads up
   //if you want it to be case-insensitive for locating the string:
   //   Pos(UpperCase(cExtractStartText),UpperCase(cSource))
   //for our ending position, just find the Pos of it within our cSource string
   nStopPos  := Pos(cExtractToText,cSource);
   //next, copy out the string that is in the middle of the Start/End pieces you passed in
   result := Copy(cSource,              //source to copy from
                  nStartPos,            //starting position
                  nStopPos - nStartPos);//how many characters to copy
end;

Open in new window

0
 

Author Closing Comment

by:Code2009
ID: 31582917
Thank you. You are a star :)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question