• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 446
  • Last Modified:

D2005 VCL TwebBrowser - listing all links in page

Hi all,
I'm using D2005 and I want to be able to list all the links within any frames within any page of an online html document; ie basically all the links in the finished html document. Because I want to list links once any javascript etc has run (& added any html) I think I need to use the TwebBrowser to do this? With 2005, do I still need mshtml_tbl.pas to use the TWebBrowser & if so how do I get it and import it in to Delphi? Would be very grateful for help on this and code as to how to use the TwebBrowser to list the links (if indeed this is what I should use).
Thanks alot,
P :)
0
Pandora
Asked:
Pandora
  • 6
  • 4
  • 3
1 Solution
 
PandoraAuthor Commented:
hmmm - erm, did I say something wrong? Or is it expert day off?
Surely its not *toooooo* easy? Gosh I feel a bit embarassed.
Ok I'll start to answer my own question...
Part 1:
Yes u do need mshtml_tlb.pas, and you get it by Component, Import Component, Active X then selecting MS Scripting library (ie mshtml_tlb.dll) and creating the unit
type library documentation is at http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/ifaces/anchorelement/anchorelement.asp
Part 2:
I'll get back to you...unless one of you lot want to chip in?!
0
 
Eddie ShipmanAll-around developerCommented:
Try something like this:

procedure TForm1.GetLinks(AURL: String; AURLList, ALinkTextList: TStrings);
var
  IDoc    : IHTMLDocument2;
  ovLinks:  OleVariant;
  idHTTP1:  TidHTTP;
begin
  Idoc := WebBrowser1.Document as IHTMLDocument2;
  try
    ovLinks := IDoc.all.tags('A');
    if ovLinks.Length > 0 then
    begin
      for x := 0 to ovLinks.Length-1 do
      begin
        AURLList.Add(ovLinks.Item(x).href);
        ALinkTextList.Add(ovLinks.Item(x).innerText);
      end;
    end;
  finally
    IDoc := nil;
  end;
end;
0
 
JustinWillisCommented:
Not sure if any help, but I use Indy IDHTTP.GET to retrieve the html, I would then check through it for whatever I wanted like links..

Different approach but that is how I like to do it, sorry if I misunderstood the questions.

As far as I know Eddie's code probably does the job nicely but just in case something else to try..

Regards,
Justin Willis.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
Eddie ShipmanAll-around developerCommented:
Here is the same code for above using idHTTP to get the source:

{both are from my code posted on http://www.delphipages.com}

uses  ..., ActiveX, COMObj, MSHTML,
      IdBaseComponent, IdComponent, IdTCPConnection,
      IdTCPClient, IdHTTP

procedure TForm1.GetLinks(AURL: String; AURLList, ALinkTextList: TStrings);
var
  IDoc    : IHTMLDocument2;
  strHTML : String;
  v       : Variant;
  vTitle  : string;
  i, j, x:  integer;
  ovLinks:  OleVariant;
  idHTTP1:  TidHTTP;
begin
  Idoc:=CreateComObject(Class_HTMLDOcument) as IHTMLDocument2;
  idHTTP1 := TidHTTP.Create(Self);
  try
    IDoc.designMode:='on';
    while IDoc.readyState<>'complete' do
      Application.ProcessMessages;
    v:=VarArrayCreate([0,0],VarVariant);
    strHTML := idHTTP1.Get(AURL);
    v[0]:= strHTML;
    IDoc.write(PSafeArray(System.TVarData(v).VArray));
    IDoc.designMode:='off';
    while IDoc.readyState<>'complete' do
      Application.ProcessMessages;
    ovLinks := IDoc.all.tags('A');
    if ovLinks.Length > 0 then
    begin
      for x := 0 to ovLinks.Length-1 do
      begin
        AURLList.Add(ovLinks.Item(x).href);
        ALinkTextList.Add(ovLinks.Item(x).innerText);
      end;
    end;
  finally
    idHTTP1.Free;
    IDoc := nil;
  end;
end;
0
 
JustinWillisCommented:
cool, u the man Eddie!

J
0
 
Eddie ShipmanAll-around developerCommented:
The only problem with doing it this way is that the RELATIVE links would all be
prefixed with about:blank.


0
 
JustinWillisCommented:
Really? I don't understand what is wrong with simple

    TRichEdit.lines.text := IDHTTP1.Get(TheNewURL);

and then find the links yourself or is there a huge flaw in this I am missing?

Cheers,
J
0
 
PandoraAuthor Commented:
Ok - thanks Eddie & J - I'll have a play & see which works best & chase down the framesets using the same idea. Thanks for your help both of you, P.
0
 
Eddie ShipmanAll-around developerCommented:
Getting code to the framesets isn't as straightforward as it seems. IE, which TWebBrowser is based, adds
some security for cross-domain access and you can not easily obtain HTML sourcecode from a frame from
the DOM.


JustinWillis - But you'd have to be able to parse the test correctly whereby the DOM does it for you.
0
 
Eddie ShipmanAll-around developerCommented:
I meant parse the text correcty, not test.
0
 
PandoraAuthor Commented:
Hi Eddie & J, I've just used your technique Eddie, but pulled out frames instead, grabbed the source and then I just recurse it in a separate proc to iterate through any nested frames. For IFrames I've done the same. So its worked a treat - I had a play with that DOM malarkey & its not for me, very tedious!

Anyway thanks for the help,
P :)

procedure TForm1.GetFrames2(var AFrameNameList: Tstringlist; var AFrameSourceList: Tstringlist);
var
IDoc      :IHTMLDocument2;
ovLinks   :OleVariant;
i        :integer;
begin
Idoc := WebBrowser1.Document as IHTMLDocument2;
try
   ovLinks := IDoc.all.tags('FRAME');
   if ovLinks.Length > 0 then
      for i := 0 to ovLinks.Length-1 do
          begin
          AFrameNameList.Add('Frame: '+ovLinks.Item(i).name);
          AFrameSourceList.Add(ovLinks.Item(i).src);
          end;
    finally
    IDoc := nil;
    end;
end;
0
 
Eddie ShipmanAll-around developerCommented:
I was just saying that to get the HTML source of the frame was difficult if it is on another domain. What you are doing is getting the
location, URL, of the frame's HTML source. Are you sure that's what you want or are you geetting the HTML source in another
process using the AFrameSourceList?
0
 
PandoraAuthor Commented:
Hi Eddie - exactly, so I retrieve the source link of the frame then browse there directly and grab the source. this is fine for what I wanted to do though I realise if there was some script running etc using the Wb control could end up with a different page when you load it. but not an issue for me!
Thanks P
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 6
  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now