Solved

View source of internet explorer page with frames

Posted on 2006-11-06
19
322 Views
Last Modified: 2010-04-05
Hi.

I know this is probably a really, really long shot, but does anyone know if it is possible to capture the source of a frame within Internet Explorer.  I can specifically name the window by it's caption (this will always be the same for my purposes), and I can also specifically name the frame via it's HTML object name.  I "just" want to capture the source within Delphi so that I can parse it and do other stuff with it.

This question is definitely about this specific scenario, and therefore a "workround solution" will not suffice.

Thanks in advance,

John.
0
Comment
Question by:Jaymol
19 Comments
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 17896034
Hi John.

Sorry you had to wait this long for an answer.

I have put together a sample app for you which lists all IE instances and shows the relevant source. You say you can specifically name the window by its caption, so you could then get the handle for the window using the FindWindow API function. You can find your window by comparing this handle with the HWND property of the IWebbrowser2 interface. Simply modify the btnGetBrowsersClick method with something like
...
  for i:= 0 to sw.Count-1 do
  begin
    browser:= sw.Item(i) as IWebBrowser2;
    if browser.HWND <> TheHandleYouWant then Continue;
...

Let me know if you need more help.

Regards
Pierre

PAS File
=================================================================
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, ShDocVw, StdCtrls, mshtml, xpman, ExtCtrls;

type
  THtmlSrcObj = class(TObject)
    private
      FHtmlSource: TStrings;
      WBrowser: IWebBrowser2;
      HtmlDoc: IHTMLDocument2;
      function GetHtmlSource: TStrings;
      function GetHtmlTitle: string;
    public
      constructor Create; virtual;
      destructor Destroy; override;
      property HTMLSource: TStrings read GetHtmlSource;
      property DocTitle: string read GetHtmlTitle;
  end;


  TForm1 = class(TForm)
    Memo1: TMemo;
    lbBrowsers: TListBox;
    btnGetBrowsers: TButton;
    Panel1: TPanel;
    Panel2: TPanel;
    Splitter1: TSplitter;
    Panel3: TPanel;
    Panel4: TPanel;
    cbWrap: TCheckBox;
    procedure btnGetBrowsersClick(Sender: TObject);
    procedure lbBrowsersClick(Sender: TObject);
    procedure FormCreate(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
    procedure cbWrapClick(Sender: TObject);
  private
    HtmlSourceObjects: TList;
  end;

var
  Form1: TForm1;

implementation

uses VarUtils;

{$R *.dfm}

{ THtmlSrcObj }

constructor THtmlSrcObj.Create;
begin
  FHtmlSource:= TStringList.Create;
end;

destructor THtmlSrcObj.Destroy;
begin
  FHtmlSource.Free;
end;

function THtmlSrcObj.GetHtmlSource: TStrings;
var element: IHTMLElement;
    frames: IHTMLFramesCollection2;
    frameWindow: IHTMLWindow2;
    i: integer;
    FrameIndex: OLEVariant;
    FrameObj: IDispatch;
    FrameDoc: IHTMLDocument2;
begin
  result:= FHtmlSource;
  FHtmlSource.Clear;
  try
    if WBrowser.Document.QueryInterface(IID_IHTMLDocument2, HtmlDoc) = S_OK then
    begin
      element:= htmlDoc.all.item(0, 0) as IHTMLElement;
      if Assigned(Element)
        then FHtmlSource.Text:= element.outerHTML;

      frames:= htmlDoc.frames;
      for i:= 0 to frames.length-1 do
      begin
        FHtmlSource.Add('');
        FHtmlSource.Add('<!--'+StringOfChar('*',5)+' '
                        +'Frame '+IntToStr(i+1)
                        +StringOfChar('*',5)+' -->');
        try
          FrameIndex:= i;
          FrameObj:= frames.item(FrameIndex);
          frameWindow:= frameObj AS IHTMLWindow2;
          FrameDoc:= (frameWindow.Document AS IHTMLDocument2);
          FHtmlSource.Add('<!--'
                        +StringOfChar('*',5)+' '
                        +'Title: '+FrameDoc.title
                        +' URL: '+FrameDoc.url
                        +StringOfChar('*',5)+' -->');

          element:= FrameDoc.all.item(0, 0) as IHTMLElement;
          if Assigned(Element)
            then FHtmlSource.Add(element.outerHTML);
            FHtmlSource.Add('');
        except on e: exception
          do FHtmlSource.Add(e.Message);
        end;
      end;
    end;
  except on e: exception
    do Result.Add(e.Message);
  end;
end;

function THtmlSrcObj.GetHtmlTitle: string;
begin
  result:= '';
  if Assigned(HtmlDoc)
    then result:= HtmlDoc.title;
end;

procedure TForm1.btnGetBrowsersClick(Sender: TObject);
var sw: IShellWindows;
    browser: IWebBrowser2;
    i: integer;
    htmlDoc: IHTMLDocument2;
    obj: THtmlSrcObj;
begin
  lbBrowsers.Clear;
  sw := CoShellWindows.Create;
  for i:= 0 to sw.Count-1 do
  begin
    browser:= sw.Item(i) as IWebBrowser2;
    if browser.Document.QueryInterface(IID_IHTMLDocument2, htmlDoc) = S_OK then
    begin
      obj:= THtmlSrcObj.Create;
      HtmlSourceObjects.Add(obj);
      obj.WBrowser:= browser;
      obj.HtmlDoc:=htmlDoc;

      lbBrowsers.AddItem(browser.LocationName, obj);
    end;
  end;
end;

procedure TForm1.lbBrowsersClick(Sender: TObject);
var obj: TObject;
begin
  if lbBrowsers.ItemIndex = -1 then exit;
  obj:= lbBrowsers.Items.Objects[lbBrowsers.ItemIndex];
  if obj is THtmlSrcObj
    then Memo1.Text:= (obj as THtmlSrcObj).HTMLSource.Text;
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
  HtmlSourceObjects:= Tlist.Create;
end;

procedure TForm1.FormDestroy(Sender: TObject);
var obj: TObject;
begin
  While HtmlSourceObjects.Count > 0 do
  begin
    obj:= HtmlSourceObjects.Items[0];
    if Assigned(obj) AND (obj is THtmlSrcObj)
      then (obj as THtmlSrcObj).Free;
    HtmlSourceObjects.Delete(0);
  end;
  HtmlSourceObjects.Free;
end;

procedure TForm1.cbWrapClick(Sender: TObject);
begin
  Memo1.WordWrap:= cbWrap.Checked;
end;

end.



DFM File
================================================================
object Form1: TForm1
  Left = 192
  Top = 114
  Width = 696
  Height = 480
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  OnDestroy = FormDestroy
  PixelsPerInch = 96
  TextHeight = 13
  object Splitter1: TSplitter
    Left = 241
    Top = 0
    Width = 6
    Height = 446
    Color = clBackground
    ParentColor = False
  end
  object Panel1: TPanel
    Left = 0
    Top = 0
    Width = 241
    Height = 446
    Align = alLeft
    BevelOuter = bvNone
    Caption = 'Panel1'
    TabOrder = 0
    object Panel2: TPanel
      Left = 0
      Top = 0
      Width = 241
      Height = 33
      Align = alTop
      BevelOuter = bvNone
      Caption = 'Panel2'
      TabOrder = 0
      DesignSize = (
        241
        33)
      object btnGetBrowsers: TButton
        Left = 8
        Top = 4
        Width = 227
        Height = 25
        Anchors = [akLeft, akTop, akRight]
        Caption = 'List open browsers'
        TabOrder = 0
        OnClick = btnGetBrowsersClick
      end
    end
    object lbBrowsers: TListBox
      Left = 0
      Top = 33
      Width = 241
      Height = 413
      Align = alClient
      BorderStyle = bsNone
      ItemHeight = 13
      TabOrder = 1
      OnClick = lbBrowsersClick
    end
  end
  object Panel3: TPanel
    Left = 247
    Top = 0
    Width = 441
    Height = 446
    Align = alClient
    BevelOuter = bvNone
    Caption = 'Panel3'
    TabOrder = 1
    object Memo1: TMemo
      Left = 0
      Top = 29
      Width = 441
      Height = 417
      Align = alClient
      BevelInner = bvNone
      BevelOuter = bvNone
      BorderStyle = bsNone
      TabOrder = 0
      WordWrap = False
    end
    object Panel4: TPanel
      Left = 0
      Top = 0
      Width = 441
      Height = 29
      Align = alTop
      BevelOuter = bvNone
      TabOrder = 1
      object cbWrap: TCheckBox
        Left = 16
        Top = 8
        Width = 97
        Height = 17
        Caption = '&Wrap text'
        TabOrder = 0
        OnClick = cbWrapClick
      end
    end
  end
end
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 17905232
Did you come right?
0
 
LVL 6

Author Comment

by:Jaymol
ID: 17905236
Hi Pierre.

Sorry for the delay.  I'm literally just about to try that so I'll get back to you shortly.

Thanks for your help so far.  It's greatly appreciated.

Thanks,

John.
0
 
LVL 6

Author Comment

by:Jaymol
ID: 17905682
Hi Pierre,

That's a great bit of code, but I'm afraid it doesn't fully do what I want.

It goes as far as getting the HTML from the selected instance of IE, but it doesn't get me the code in the specific frame, which is what I need to do.

Can you help me out with that?

Thanks,

John.
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 17906915
The code does get the frame source as well, I tested it.

See the THtmlSrcObj.GetHtmlSource method. It does, however, dump it all in the memo control so I can see why you thought so. What is the URL you are trying to get the source for, let me try it on my side as well...

Regards
Pierre
0
 
LVL 6

Author Comment

by:Jaymol
ID: 17906987
Hi Pierre,

Sorry - you're absolutely right.  It's another frame, within a frame, within a frame, that I'm looking for.

I'm afraid I can't give you the URL that I'm working on as it's a password protected area on a web site that I obviously can't give out access to.

The perfect solution to this would be to make the code search frames recursively so that it finds all frames within all frames within all frames etc..

Would it be difficult to modify it to do that?

Thanks,

John.
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 17912434
Sure. Just change to the following:

function THtmlSrcObj.GetHtmlSource: TStrings;

  procedure AddDocSource(ADoc: IHTMLDocument2);
  var fi: integer;
      FrameIndex: OLEVariant;
      FrameObj: IDispatch;
      FrameDoc: IHTMLDocument2;
      element: IHTMLElement;
      frames: IHTMLFramesCollection2;
      frameWindow: IHTMLWindow2;
  begin
    element:= ADoc.all.item(0, 0) as IHTMLElement;
    if Assigned(Element)
      then FHtmlSource.Add(element.outerHTML);

    frames:= ADoc.frames;
    for fi:= 0 to frames.length-1 do
    begin
      FHtmlSource.Add('');
      FHtmlSource.Add('<!--'+StringOfChar('*',5)+' '
                      +'Frame '+IntToStr(fi+1)
                      +StringOfChar('*',5)+' -->');
      try
        FrameIndex:= fi;
        FrameObj:= frames.item(FrameIndex);
        frameWindow:= frameObj AS IHTMLWindow2;
        FrameDoc:= (frameWindow.Document AS IHTMLDocument2);
        FHtmlSource.Add('<!--'
                        +StringOfChar('*',5)+' '
                        +'Title: '+FrameDoc.title
                        +' URL: '+FrameDoc.url
                        +StringOfChar('*',5)+' -->');
        AddDocSource(FrameDoc);
        //element:= FrameDoc.all.item(0, 0) as IHTMLElement;
        //if Assigned(Element)
        //  then FHtmlSource.Add(element.outerHTML);
        FHtmlSource.Add('');
      except on e: exception
        do FHtmlSource.Add(e.Message);
      end;
    end;
  end;

begin
  result:= FHtmlSource;
  FHtmlSource.Clear;
  try
    if WBrowser.Document.QueryInterface(IID_IHTMLDocument2, HtmlDoc) = S_OK then
    begin
      AddDocSource(htmlDoc);
    end;
  except on e: exception
    do Result.Add(e.Message);
  end;
end;


Regards
Pierre
0
 
LVL 6

Author Comment

by:Jaymol
ID: 17913022
Thanks Pierre.

I've tried that and it just crashes out.  After continuing execution a couple of times when the CPU Debug window pops up, I get the error message "Not enough storage is available to complete this operation."

Any ideas?

Thanks,

John.
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 17913212
Try stepping through the code and let me know where the error occurs. Let me know if you need help with this...
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 6

Author Comment

by:Jaymol
ID: 17913791
Hi Pierre,

Well I've stepped through the code and found that it falls over pretty much immediately, on the following line of code :

                 sw := CoShellWindows.Create;

I don't know if it'll make any difference to this particular issue, but I'm running Delphi 5 (unfortunately).  I did have to change a couple of small things to make it work, but nothing to do with the actual browser interfacing code.

Thanks,

John.
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 17914085
Not sure if this may help but try it anyway. Change the get browser button event handler as follows and let me know if the same error happens:


procedure TForm1.btnGetBrowsersClick(Sender: TObject);
var sw: TShellWindows;
    browser: IWebBrowser2;
    i: integer;
    htmlDoc: IHTMLDocument2;
    obj: THtmlSrcObj;
begin
  lbBrowsers.Clear;
  sw := TShellWindows.Create(nil);
  try
    for i:= 0 to sw.Count-1 do
    begin
      browser:= sw.Item(i) as IWebBrowser2;
      if browser.Document.QueryInterface(IID_IHTMLDocument2, htmlDoc) = S_OK then
      begin
        obj:= THtmlSrcObj.Create;
        HtmlSourceObjects.Add(obj);
        obj.WBrowser:= browser;
        obj.HtmlDoc:=htmlDoc;

        lbBrowsers.AddItem(browser.LocationName, obj);
      end;
    end;
  finally
    sw.Free;
  end;
end;
0
 
LVL 6

Author Comment

by:Jaymol
ID: 17929895
Hi Pierre,

Yes, I still get the same error.  It's quite strange because it worked with the original code and the bit that causes the error hasn't changed!

Thanks,

John.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 17938481
To be honest with you, using the DOM to retrieve the source for frames is bound to be frought with
problems due to M$'s new security model. If the frame is cross-domain, which means that the frame's
source is not on the same domain as the hosting code, then you won't be able to get it via the DOM.

My suggestion would be to parse the HTML and get each frame's code using idHTTP, parse that frame's
code and get the src of each contained frame in a recursive manner.

I have some code to do this posted on Delphi Pages but it is currently down for maint.
I will see if I can find it for you.
0
 
LVL 6

Author Comment

by:Jaymol
ID: 17938503
Thanks Pierre.

I await your code!

John.
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 18006265
I believe a solution was provided.
0
 
LVL 6

Author Comment

by:Jaymol
ID: 18006558
Hi Pierre,

None of the above solutions worked concisely.  They gave methods for partially solving my problem, but none were complete and therefore not solutions.

Thanks for the help all the same, but I was clear about looking for a 100% concise solution.

John.
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 18008221
I spent a lot of time responding to your question and feel it was anwered. You asked how to capture the source and my solution does that. it recursively lists the source for all frames and frames within frames. I tested it on my side and it worked fine. You also stated in a comment that it did i.e. "...it worked with the original code ...".

I don't understand why you say that my suggestions partially solve your problem and that they were incomplete and can not be regarded as solutions when you asked how to get the source and my solution does just that. Please explain why you say so. Understand my frustration as we are volunteers (i.e. we don't get paid for this service) here at EE who answer questions and sometimes spend a lot of time on providing them.

I'm not interested to get into an argument here, so if you really feel unsatisfied then fine I suggest the moderator refund your points but suggest this still gets saved as a PAQ as I believe there is value in these posts for future reference.
0
 
LVL 1

Accepted Solution

by:
DarthMod earned 0 total points
ID: 18025215
PAQd, 500 points refunded.

DarthMod
CS Moderator
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now