Solved

Delphi: IHTMLDocument2, Extract Link

Posted on 2014-01-21
11
2,129 Views
Last Modified: 2014-01-21
Hi,

I'm trying to extract some info from a html code and i need to extract the url (link) from a structure block in html source.

Example, i have this html structure:

<div class="browse-info">
<span class="info">
<span class="browseTitleLink"><a href="http://xxx.com/movie/xxx">xxx</a></span><br />
<span class="browseInfoList" ><b>Size:</b> 1.85 GB</span><br />
<span class="browseInfoList" ><b>Quality:</b> 1080p</span><br />
<span class="browseInfoList" ><b>Genre:</b> Crime | Drama</span><br />
<span class="browseInfoList" ><b>IMDB Rating:</b> 6.0/10</span><br />
<span class="browseSeeds">
<span class="peers"><b>Peers:</b> 1454</span>
<span class="seeds"><b>Seeds:</b> 3412</span>
</span>
</span>
<span class="links">
<a href="http://xxx" class="std-btn-small mright">View Info<span></span></a>
<a href="http://xxx" class="std-btn-small mleft downloadDwl" data-movieID="4502" data-downloadID="4694">Download<span></span></a>
</span>
</div>
</div>
<div class="divider"></div>
</div>

Open in new window


I'm using this code to get some info:

procedure TForm1.Button3Click(Sender: TObject);
Var
  Documento : OleVariant;
  Elementos : OleVariant;
  I         : Integer;
  Item : TListItem;
  Source : TMemoryStream;
  Memo : Tmemo;
  IdHttp : TidHttp;
  Qualidade : String;
begin
Listview1.Clear;
idHttp := TIdHttp.Create(Self);
idHttp.AllowCookies := True;
idHttp.HandleRedirects := True;
memo := Tmemo.Create(Self);
Memo.Visible := False;
memo.Parent := Form1;
Source := TMemoryStream.Create;
Qualidade := 'http://xxx';
if CheckBox1.Checked then
 Qualidade := 'http:/xxx';
if CheckBox2.Checked then
 Qualidade := 'http:/xxx';
if CheckBox1.Checked and Checkbox2.Checked then
 Qualidade := 'http://xxx';
if Edit1.Text <> '' then
 Qualidade := 'http://xxx';
 Try
  Try
   IdHTTP.Get(Qualidade, Source);
   Source.Position := 0;
  Except on E: Exception do
   Begin
    ShowMessage(e.Message);
    Source.Free;
    memo.Free;
    idHttp.Free;
    Exit;
   End;
  End;
  memo.Lines.LoadFromStream(Source);
  Documento := coHTMLDocument.Create as IHTMLDocument2;
   if Source.Size > 0 then
    Documento.write(memo.Lines.Text)
   else
    Begin
     ShowMessage('erro');
     Source.Free;
     memo.Free;
     idHttp.Free;
     Exit;
   End;
  Documento.close;
  Listview1.Items.BeginUpdate;
   for i := 0 to Documento.body.all.length - 1 do
    begin
     Elementos := Documento.body.all.item(i);
      if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseTitleLink') then
       Begin
        item := Listview1.Items.Add;
        Item.Caption := Elementos.innerText;
       End;
        if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseInfoList') then
         item.SubItems.Add(Elementos.innerText);
        if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseSeeds') then
         Item.SubItems.Add(Elementos.innerText);      
    end;
  ListView1.Items.EndUpdate;
 Finally
  Source.Free;
  memo.Free;
  idHttp.Free;
 End;
end;

Open in new window


If i call:

if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
         Item.SubItems.Add(elementos.innerText);

Open in new window


This give me the text "View info Download" and not the links..

What do i need to do? Need a code, since i don't want to extract all links, but the url in the same order that i extract the info to put in a ListView.

http://imageshack.com/a/img23/9100/htyy.pnghttp://imageshack.com/a/img23/9100/htyy.png
Capture.PNG
0
Comment
Question by:Júlio
  • 6
  • 5
11 Comments
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39798127
Tried with elementos.innerHTML?
0
 

Author Comment

by:Júlio
ID: 39798143
Yes and don't work like i want.

 if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
         Item.SubItems.Add(elementos.innerHTML);

Open in new window


Returns with tags classnames all mixed.
0
 
LVL 31

Accepted Solution

by:
Marco Gasi earned 500 total points
ID: 39798147
No, try to use elementos.href: this should work if you get all tags:

Elementos: Document.all.tags('A');

but perhaps it works even with all.item
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 

Author Comment

by:Júlio
ID: 39798161
if Documento.all.tags('A') <> 0 then
         Item.SubItems.Add(elementos.href);

Open in new window


Returns: "Member not found"
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39798184
I'm sorry. Now I see you define Documento as OleVariant. What I suggested requires it be defined as IHTMLDocument2...
0
 

Author Comment

by:Júlio
ID: 39798187
But if i do that, i need to rewrite all the code. Right?
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39798201
I'm not sure but I think you don't. Give it a try within Button3Click event.
0
 

Author Comment

by:Júlio
ID: 39798614
Omg, i don't undestand:

procedure TForm1.Button1Click(Sender: TObject);
Var
 Documento : IHTMLDocument2;
 ArrayV    : OleVariant;
 InfoV     : IHTMLElement;

 Buffer    : String;
 http      : TidHttp;
 ListItem  : TListItem;
 I         : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;

Try
 Buffer := http.Get('http://xxx');
Except on E: Exception do
 Begin
  ShowMessage(e.Message);
  Exit;
 End;
End;

Documento :=  coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;

Listview1.Items.BeginUpdate;

infoV := Documento.body.all as IHTMLElement;
for I := 0 to Documento.all.length -1  do
 Begin
  if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
   Begin
    Listitem := Listview1.Items.Add;
    ListItem.Caption := infoV.innerText;
   End;
 End;

Listview1.Items.EndUpdate;
end;

Open in new window


i'm rewritting.

The problem is between lines 33 and 41. What am i doing wrong? help-me, show me how.

Error: "Interface not supported"
0
 

Author Comment

by:Júlio
ID: 39798658
ok, i got it, now i need to get the link:

procedure TForm1.Button1Click(Sender: TObject);
Var
 Documento : IHTMLDocument2;
 ArrayV    : OleVariant;
 InfoV     : IHTMLElement;

 Buffer    : String;
 http      : TidHttp;
 ListItem  : TListItem;
 I         : Integer;
 ElCount   : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;

Try
 Buffer := http.Get('http://xxx');
Except on E: Exception do
 Begin
  ShowMessage(e.Message);
  Exit;
 End;
End;

Documento :=  coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;

Listview1.Items.BeginUpdate;
ElCount := Documento.all.length;
//infoV := Documento.body.all as IHTMLElement;
for I := 0 to Elcount -1  do
 Begin
  infoV := Documento.all.item(I, '') as IHTMLElement;
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
    Begin
     Listitem := Listview1.Items.Add;
     ListItem.Caption := infoV.innerText;
    End;
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseInfoList') then
    ListItem.SubItems.Add(infoV.innerText);
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseSeeds') then
    ListItem.SubItems.Add(infoV.innerText);
 End;

Listview1.Items.EndUpdate;
end;

Open in new window


If i add

Var
 LinkV     : IHTMLElement;
(...)
LinkV := Documento.links.item('', I) as IHTMLElement;
  ListItem.subitems.Add(LinkV.innerText);

Open in new window


Don't work too.


UPDATE:

So easy, i can't believe:

  if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
    ListItem.SubItems.Add(InfoV.getAttribute('href', 0));

Open in new window


TY!!!
0
 

Author Closing Comment

by:Júlio
ID: 39798779
He gave the solution, it took me a while to understand the concept.

Ty Marco Gasi!
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39799108
I'm sorry to not have helped you more, but I have gone away (to sleep!). I'm happy you solved your problem. I never used that, but AFAI it should had worked with

  if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
    ListItem.SubItems.Add(InfoV.href);

Open in new window


but for this you should have to use look only for tags.

Thanks for points and good luck with your project.
Marco
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Machine not responding during CopyFile() 3 104
Internet Explorer View Settings Question 15 116
When i run adoquery my application freezes 26 180
enhance the following code 3 37
This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question