Solved

Delphi: IHTMLDocument2, Extract Link

Posted on 2014-01-21
11
2,084 Views
Last Modified: 2014-01-21
Hi,

I'm trying to extract some info from a html code and i need to extract the url (link) from a structure block in html source.

Example, i have this html structure:

<div class="browse-info">
<span class="info">
<span class="browseTitleLink"><a href="http://xxx.com/movie/xxx">xxx</a></span><br />
<span class="browseInfoList" ><b>Size:</b> 1.85 GB</span><br />
<span class="browseInfoList" ><b>Quality:</b> 1080p</span><br />
<span class="browseInfoList" ><b>Genre:</b> Crime | Drama</span><br />
<span class="browseInfoList" ><b>IMDB Rating:</b> 6.0/10</span><br />
<span class="browseSeeds">
<span class="peers"><b>Peers:</b> 1454</span>
<span class="seeds"><b>Seeds:</b> 3412</span>
</span>
</span>
<span class="links">
<a href="http://xxx" class="std-btn-small mright">View Info<span></span></a>
<a href="http://xxx" class="std-btn-small mleft downloadDwl" data-movieID="4502" data-downloadID="4694">Download<span></span></a>
</span>
</div>
</div>
<div class="divider"></div>
</div>

Open in new window


I'm using this code to get some info:

procedure TForm1.Button3Click(Sender: TObject);
Var
  Documento : OleVariant;
  Elementos : OleVariant;
  I         : Integer;
  Item : TListItem;
  Source : TMemoryStream;
  Memo : Tmemo;
  IdHttp : TidHttp;
  Qualidade : String;
begin
Listview1.Clear;
idHttp := TIdHttp.Create(Self);
idHttp.AllowCookies := True;
idHttp.HandleRedirects := True;
memo := Tmemo.Create(Self);
Memo.Visible := False;
memo.Parent := Form1;
Source := TMemoryStream.Create;
Qualidade := 'http://xxx';
if CheckBox1.Checked then
 Qualidade := 'http:/xxx';
if CheckBox2.Checked then
 Qualidade := 'http:/xxx';
if CheckBox1.Checked and Checkbox2.Checked then
 Qualidade := 'http://xxx';
if Edit1.Text <> '' then
 Qualidade := 'http://xxx';
 Try
  Try
   IdHTTP.Get(Qualidade, Source);
   Source.Position := 0;
  Except on E: Exception do
   Begin
    ShowMessage(e.Message);
    Source.Free;
    memo.Free;
    idHttp.Free;
    Exit;
   End;
  End;
  memo.Lines.LoadFromStream(Source);
  Documento := coHTMLDocument.Create as IHTMLDocument2;
   if Source.Size > 0 then
    Documento.write(memo.Lines.Text)
   else
    Begin
     ShowMessage('erro');
     Source.Free;
     memo.Free;
     idHttp.Free;
     Exit;
   End;
  Documento.close;
  Listview1.Items.BeginUpdate;
   for i := 0 to Documento.body.all.length - 1 do
    begin
     Elementos := Documento.body.all.item(i);
      if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseTitleLink') then
       Begin
        item := Listview1.Items.Add;
        Item.Caption := Elementos.innerText;
       End;
        if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseInfoList') then
         item.SubItems.Add(Elementos.innerText);
        if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseSeeds') then
         Item.SubItems.Add(Elementos.innerText);      
    end;
  ListView1.Items.EndUpdate;
 Finally
  Source.Free;
  memo.Free;
  idHttp.Free;
 End;
end;

Open in new window


If i call:

if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
         Item.SubItems.Add(elementos.innerText);

Open in new window


This give me the text "View info Download" and not the links..

What do i need to do? Need a code, since i don't want to extract all links, but the url in the same order that i extract the info to put in a ListView.

http://imageshack.com/a/img23/9100/htyy.pnghttp://imageshack.com/a/img23/9100/htyy.png
Capture.PNG
0
Comment
Question by:Júlio
  • 6
  • 5
11 Comments
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39798127
Tried with elementos.innerHTML?
0
 

Author Comment

by:Júlio
ID: 39798143
Yes and don't work like i want.

 if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
         Item.SubItems.Add(elementos.innerHTML);

Open in new window


Returns with tags classnames all mixed.
0
 
LVL 31

Accepted Solution

by:
Marco Gasi earned 500 total points
ID: 39798147
No, try to use elementos.href: this should work if you get all tags:

Elementos: Document.all.tags('A');

but perhaps it works even with all.item
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 

Author Comment

by:Júlio
ID: 39798161
if Documento.all.tags('A') <> 0 then
         Item.SubItems.Add(elementos.href);

Open in new window


Returns: "Member not found"
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39798184
I'm sorry. Now I see you define Documento as OleVariant. What I suggested requires it be defined as IHTMLDocument2...
0
 

Author Comment

by:Júlio
ID: 39798187
But if i do that, i need to rewrite all the code. Right?
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39798201
I'm not sure but I think you don't. Give it a try within Button3Click event.
0
 

Author Comment

by:Júlio
ID: 39798614
Omg, i don't undestand:

procedure TForm1.Button1Click(Sender: TObject);
Var
 Documento : IHTMLDocument2;
 ArrayV    : OleVariant;
 InfoV     : IHTMLElement;

 Buffer    : String;
 http      : TidHttp;
 ListItem  : TListItem;
 I         : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;

Try
 Buffer := http.Get('http://xxx');
Except on E: Exception do
 Begin
  ShowMessage(e.Message);
  Exit;
 End;
End;

Documento :=  coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;

Listview1.Items.BeginUpdate;

infoV := Documento.body.all as IHTMLElement;
for I := 0 to Documento.all.length -1  do
 Begin
  if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
   Begin
    Listitem := Listview1.Items.Add;
    ListItem.Caption := infoV.innerText;
   End;
 End;

Listview1.Items.EndUpdate;
end;

Open in new window


i'm rewritting.

The problem is between lines 33 and 41. What am i doing wrong? help-me, show me how.

Error: "Interface not supported"
0
 

Author Comment

by:Júlio
ID: 39798658
ok, i got it, now i need to get the link:

procedure TForm1.Button1Click(Sender: TObject);
Var
 Documento : IHTMLDocument2;
 ArrayV    : OleVariant;
 InfoV     : IHTMLElement;

 Buffer    : String;
 http      : TidHttp;
 ListItem  : TListItem;
 I         : Integer;
 ElCount   : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;

Try
 Buffer := http.Get('http://xxx');
Except on E: Exception do
 Begin
  ShowMessage(e.Message);
  Exit;
 End;
End;

Documento :=  coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;

Listview1.Items.BeginUpdate;
ElCount := Documento.all.length;
//infoV := Documento.body.all as IHTMLElement;
for I := 0 to Elcount -1  do
 Begin
  infoV := Documento.all.item(I, '') as IHTMLElement;
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
    Begin
     Listitem := Listview1.Items.Add;
     ListItem.Caption := infoV.innerText;
    End;
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseInfoList') then
    ListItem.SubItems.Add(infoV.innerText);
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseSeeds') then
    ListItem.SubItems.Add(infoV.innerText);
 End;

Listview1.Items.EndUpdate;
end;

Open in new window


If i add

Var
 LinkV     : IHTMLElement;
(...)
LinkV := Documento.links.item('', I) as IHTMLElement;
  ListItem.subitems.Add(LinkV.innerText);

Open in new window


Don't work too.


UPDATE:

So easy, i can't believe:

  if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
    ListItem.SubItems.Add(InfoV.getAttribute('href', 0));

Open in new window


TY!!!
0
 

Author Closing Comment

by:Júlio
ID: 39798779
He gave the solution, it took me a while to understand the concept.

Ty Marco Gasi!
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 39799108
I'm sorry to not have helped you more, but I have gone away (to sleep!). I'm happy you solved your problem. I never used that, but AFAI it should had worked with

  if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
    ListItem.SubItems.Add(InfoV.href);

Open in new window


but for this you should have to use look only for tags.

Thanks for points and good luck with your project.
Marco
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question