Solved

Delphi: IHTMLDocument2, Extract Link

Posted on 2014-01-21
11
1,983 Views
Last Modified: 2014-01-21
Hi,

I'm trying to extract some info from a html code and i need to extract the url (link) from a structure block in html source.

Example, i have this html structure:

<div class="browse-info">
<span class="info">
<span class="browseTitleLink"><a href="http://xxx.com/movie/xxx">xxx</a></span><br />
<span class="browseInfoList" ><b>Size:</b> 1.85 GB</span><br />
<span class="browseInfoList" ><b>Quality:</b> 1080p</span><br />
<span class="browseInfoList" ><b>Genre:</b> Crime | Drama</span><br />
<span class="browseInfoList" ><b>IMDB Rating:</b> 6.0/10</span><br />
<span class="browseSeeds">
<span class="peers"><b>Peers:</b> 1454</span>
<span class="seeds"><b>Seeds:</b> 3412</span>
</span>
</span>
<span class="links">
<a href="http://xxx" class="std-btn-small mright">View Info<span></span></a>
<a href="http://xxx" class="std-btn-small mleft downloadDwl" data-movieID="4502" data-downloadID="4694">Download<span></span></a>
</span>
</div>
</div>
<div class="divider"></div>
</div>

Open in new window


I'm using this code to get some info:

procedure TForm1.Button3Click(Sender: TObject);
Var
  Documento : OleVariant;
  Elementos : OleVariant;
  I         : Integer;
  Item : TListItem;
  Source : TMemoryStream;
  Memo : Tmemo;
  IdHttp : TidHttp;
  Qualidade : String;
begin
Listview1.Clear;
idHttp := TIdHttp.Create(Self);
idHttp.AllowCookies := True;
idHttp.HandleRedirects := True;
memo := Tmemo.Create(Self);
Memo.Visible := False;
memo.Parent := Form1;
Source := TMemoryStream.Create;
Qualidade := 'http://xxx';
if CheckBox1.Checked then
 Qualidade := 'http:/xxx';
if CheckBox2.Checked then
 Qualidade := 'http:/xxx';
if CheckBox1.Checked and Checkbox2.Checked then
 Qualidade := 'http://xxx';
if Edit1.Text <> '' then
 Qualidade := 'http://xxx';
 Try
  Try
   IdHTTP.Get(Qualidade, Source);
   Source.Position := 0;
  Except on E: Exception do
   Begin
    ShowMessage(e.Message);
    Source.Free;
    memo.Free;
    idHttp.Free;
    Exit;
   End;
  End;
  memo.Lines.LoadFromStream(Source);
  Documento := coHTMLDocument.Create as IHTMLDocument2;
   if Source.Size > 0 then
    Documento.write(memo.Lines.Text)
   else
    Begin
     ShowMessage('erro');
     Source.Free;
     memo.Free;
     idHttp.Free;
     Exit;
   End;
  Documento.close;
  Listview1.Items.BeginUpdate;
   for i := 0 to Documento.body.all.length - 1 do
    begin
     Elementos := Documento.body.all.item(i);
      if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseTitleLink') then
       Begin
        item := Listview1.Items.Add;
        Item.Caption := Elementos.innerText;
       End;
        if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseInfoList') then
         item.SubItems.Add(Elementos.innerText);
        if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseSeeds') then
         Item.SubItems.Add(Elementos.innerText);      
    end;
  ListView1.Items.EndUpdate;
 Finally
  Source.Free;
  memo.Free;
  idHttp.Free;
 End;
end;

Open in new window


If i call:

if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
         Item.SubItems.Add(elementos.innerText);

Open in new window


This give me the text "View info Download" and not the links..

What do i need to do? Need a code, since i don't want to extract all links, but the url in the same order that i extract the info to put in a ListView.

http://imageshack.com/a/img23/9100/htyy.pnghttp://imageshack.com/a/img23/9100/htyy.png
Capture.PNG
0
Comment
Question by:Júlio
  • 6
  • 5
11 Comments
 
LVL 30

Expert Comment

by:Marco Gasi
ID: 39798127
Tried with elementos.innerHTML?
0
 

Author Comment

by:Júlio
ID: 39798143
Yes and don't work like i want.

 if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
         Item.SubItems.Add(elementos.innerHTML);

Open in new window


Returns with tags classnames all mixed.
0
 
LVL 30

Accepted Solution

by:
Marco Gasi earned 500 total points
ID: 39798147
No, try to use elementos.href: this should work if you get all tags:

Elementos: Document.all.tags('A');

but perhaps it works even with all.item
0
 

Author Comment

by:Júlio
ID: 39798161
if Documento.all.tags('A') <> 0 then
         Item.SubItems.Add(elementos.href);

Open in new window


Returns: "Member not found"
0
 
LVL 30

Expert Comment

by:Marco Gasi
ID: 39798184
I'm sorry. Now I see you define Documento as OleVariant. What I suggested requires it be defined as IHTMLDocument2...
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:Júlio
ID: 39798187
But if i do that, i need to rewrite all the code. Right?
0
 
LVL 30

Expert Comment

by:Marco Gasi
ID: 39798201
I'm not sure but I think you don't. Give it a try within Button3Click event.
0
 

Author Comment

by:Júlio
ID: 39798614
Omg, i don't undestand:

procedure TForm1.Button1Click(Sender: TObject);
Var
 Documento : IHTMLDocument2;
 ArrayV    : OleVariant;
 InfoV     : IHTMLElement;

 Buffer    : String;
 http      : TidHttp;
 ListItem  : TListItem;
 I         : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;

Try
 Buffer := http.Get('http://xxx');
Except on E: Exception do
 Begin
  ShowMessage(e.Message);
  Exit;
 End;
End;

Documento :=  coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;

Listview1.Items.BeginUpdate;

infoV := Documento.body.all as IHTMLElement;
for I := 0 to Documento.all.length -1  do
 Begin
  if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
   Begin
    Listitem := Listview1.Items.Add;
    ListItem.Caption := infoV.innerText;
   End;
 End;

Listview1.Items.EndUpdate;
end;

Open in new window


i'm rewritting.

The problem is between lines 33 and 41. What am i doing wrong? help-me, show me how.

Error: "Interface not supported"
0
 

Author Comment

by:Júlio
ID: 39798658
ok, i got it, now i need to get the link:

procedure TForm1.Button1Click(Sender: TObject);
Var
 Documento : IHTMLDocument2;
 ArrayV    : OleVariant;
 InfoV     : IHTMLElement;

 Buffer    : String;
 http      : TidHttp;
 ListItem  : TListItem;
 I         : Integer;
 ElCount   : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;

Try
 Buffer := http.Get('http://xxx');
Except on E: Exception do
 Begin
  ShowMessage(e.Message);
  Exit;
 End;
End;

Documento :=  coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;

Listview1.Items.BeginUpdate;
ElCount := Documento.all.length;
//infoV := Documento.body.all as IHTMLElement;
for I := 0 to Elcount -1  do
 Begin
  infoV := Documento.all.item(I, '') as IHTMLElement;
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
    Begin
     Listitem := Listview1.Items.Add;
     ListItem.Caption := infoV.innerText;
    End;
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseInfoList') then
    ListItem.SubItems.Add(infoV.innerText);
   if (infoV.tagName = 'SPAN') and (infoV.className = 'browseSeeds') then
    ListItem.SubItems.Add(infoV.innerText);
 End;

Listview1.Items.EndUpdate;
end;

Open in new window


If i add

Var
 LinkV     : IHTMLElement;
(...)
LinkV := Documento.links.item('', I) as IHTMLElement;
  ListItem.subitems.Add(LinkV.innerText);

Open in new window


Don't work too.


UPDATE:

So easy, i can't believe:

  if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
    ListItem.SubItems.Add(InfoV.getAttribute('href', 0));

Open in new window


TY!!!
0
 

Author Closing Comment

by:Júlio
ID: 39798779
He gave the solution, it took me a while to understand the concept.

Ty Marco Gasi!
0
 
LVL 30

Expert Comment

by:Marco Gasi
ID: 39799108
I'm sorry to not have helped you more, but I have gone away (to sleep!). I'm happy you solved your problem. I never used that, but AFAI it should had worked with

  if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
    ListItem.SubItems.Add(InfoV.href);

Open in new window


but for this you should have to use look only for tags.

Thanks for points and good luck with your project.
Marco
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Objective: - This article will help user in how to convert their numeric value become words. How to use 1. You can copy this code in your Unit as function 2. than you can perform your function by type this code The Code   (CODE) The Im…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now