Júlio
asked on
Delphi: IHTMLDocument2, Extract Link
Hi,
I'm trying to extract some info from a html code and i need to extract the url (link) from a structure block in html source.
Example, i have this html structure:
I'm using this code to get some info:
If i call:
This give me the text "View info Download" and not the links..
What do i need to do? Need a code, since i don't want to extract all links, but the url in the same order that i extract the info to put in a ListView.
http://imageshack.com/a/img23/9100/htyy.png
Capture.PNG
I'm trying to extract some info from a html code and i need to extract the url (link) from a structure block in html source.
Example, i have this html structure:
<div class="browse-info">
<span class="info">
<span class="browseTitleLink"><a href="http://xxx.com/movie/xxx">xxx</a></span><br />
<span class="browseInfoList" ><b>Size:</b> 1.85 GB</span><br />
<span class="browseInfoList" ><b>Quality:</b> 1080p</span><br />
<span class="browseInfoList" ><b>Genre:</b> Crime | Drama</span><br />
<span class="browseInfoList" ><b>IMDB Rating:</b> 6.0/10</span><br />
<span class="browseSeeds">
<span class="peers"><b>Peers:</b> 1454</span>
<span class="seeds"><b>Seeds:</b> 3412</span>
</span>
</span>
<span class="links">
<a href="http://xxx" class="std-btn-small mright">View Info<span></span></a>
<a href="http://xxx" class="std-btn-small mleft downloadDwl" data-movieID="4502" data-downloadID="4694">Download<span></span></a>
</span>
</div>
</div>
<div class="divider"></div>
</div>
I'm using this code to get some info:
procedure TForm1.Button3Click(Sender: TObject);
Var
Documento : OleVariant;
Elementos : OleVariant;
I : Integer;
Item : TListItem;
Source : TMemoryStream;
Memo : Tmemo;
IdHttp : TidHttp;
Qualidade : String;
begin
Listview1.Clear;
idHttp := TIdHttp.Create(Self);
idHttp.AllowCookies := True;
idHttp.HandleRedirects := True;
memo := Tmemo.Create(Self);
Memo.Visible := False;
memo.Parent := Form1;
Source := TMemoryStream.Create;
Qualidade := 'http://xxx';
if CheckBox1.Checked then
Qualidade := 'http:/xxx';
if CheckBox2.Checked then
Qualidade := 'http:/xxx';
if CheckBox1.Checked and Checkbox2.Checked then
Qualidade := 'http://xxx';
if Edit1.Text <> '' then
Qualidade := 'http://xxx';
Try
Try
IdHTTP.Get(Qualidade, Source);
Source.Position := 0;
Except on E: Exception do
Begin
ShowMessage(e.Message);
Source.Free;
memo.Free;
idHttp.Free;
Exit;
End;
End;
memo.Lines.LoadFromStream(Source);
Documento := coHTMLDocument.Create as IHTMLDocument2;
if Source.Size > 0 then
Documento.write(memo.Lines.Text)
else
Begin
ShowMessage('erro');
Source.Free;
memo.Free;
idHttp.Free;
Exit;
End;
Documento.close;
Listview1.Items.BeginUpdate;
for i := 0 to Documento.body.all.length - 1 do
begin
Elementos := Documento.body.all.item(i);
if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseTitleLink') then
Begin
item := Listview1.Items.Add;
Item.Caption := Elementos.innerText;
End;
if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseInfoList') then
item.SubItems.Add(Elementos.innerText);
if (Elementos.tagName = 'SPAN') and (Elementos.className = 'browseSeeds') then
Item.SubItems.Add(Elementos.innerText);
end;
ListView1.Items.EndUpdate;
Finally
Source.Free;
memo.Free;
idHttp.Free;
End;
end;
If i call:
if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
Item.SubItems.Add(elementos.innerText);
This give me the text "View info Download" and not the links..
What do i need to do? Need a code, since i don't want to extract all links, but the url in the same order that i extract the info to put in a ListView.
http://imageshack.com/a/img23/9100/htyy.png
Capture.PNG
Tried with elementos.innerHTML?
ASKER
Yes and don't work like i want.
Returns with tags classnames all mixed.
if (Elementos.tagName = 'SPAN') and (Elementos.classname = 'links') then
Item.SubItems.Add(elementos.innerHTML);
Returns with tags classnames all mixed.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
if Documento.all.tags('A') <> 0 then
Item.SubItems.Add(elementos.href);
Returns: "Member not found"
I'm sorry. Now I see you define Documento as OleVariant. What I suggested requires it be defined as IHTMLDocument2...
ASKER
But if i do that, i need to rewrite all the code. Right?
I'm not sure but I think you don't. Give it a try within Button3Click event.
ASKER
Omg, i don't undestand:
i'm rewritting.
The problem is between lines 33 and 41. What am i doing wrong? help-me, show me how.
Error: "Interface not supported"
procedure TForm1.Button1Click(Sender: TObject);
Var
Documento : IHTMLDocument2;
ArrayV : OleVariant;
InfoV : IHTMLElement;
Buffer : String;
http : TidHttp;
ListItem : TListItem;
I : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;
Try
Buffer := http.Get('http://xxx');
Except on E: Exception do
Begin
ShowMessage(e.Message);
Exit;
End;
End;
Documento := coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;
Listview1.Items.BeginUpdate;
infoV := Documento.body.all as IHTMLElement;
for I := 0 to Documento.all.length -1 do
Begin
if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
Begin
Listitem := Listview1.Items.Add;
ListItem.Caption := infoV.innerText;
End;
End;
Listview1.Items.EndUpdate;
end;
i'm rewritting.
The problem is between lines 33 and 41. What am i doing wrong? help-me, show me how.
Error: "Interface not supported"
ASKER
ok, i got it, now i need to get the link:
If i add
Don't work too.
UPDATE:
So easy, i can't believe:
TY!!!
procedure TForm1.Button1Click(Sender: TObject);
Var
Documento : IHTMLDocument2;
ArrayV : OleVariant;
InfoV : IHTMLElement;
Buffer : String;
http : TidHttp;
ListItem : TListItem;
I : Integer;
ElCount : Integer;
begin
http := TIdHttp.Create(Self);
http.AllowCookies := True;
http.HandleRedirects := True;
Try
Buffer := http.Get('http://xxx');
Except on E: Exception do
Begin
ShowMessage(e.Message);
Exit;
End;
End;
Documento := coHTMLDocument.Create as IHTMLDocument2;
ArrayV := VarArrayCreate([0,0], varVariant);
ArrayV[0] := Buffer;
Documento.Write(PSafeArray(TVarData(ArrayV).VArray));
Documento.Close;
Listview1.Items.BeginUpdate;
ElCount := Documento.all.length;
//infoV := Documento.body.all as IHTMLElement;
for I := 0 to Elcount -1 do
Begin
infoV := Documento.all.item(I, '') as IHTMLElement;
if (infoV.tagName = 'SPAN') and (infoV.className = 'browseTitleLink') then
Begin
Listitem := Listview1.Items.Add;
ListItem.Caption := infoV.innerText;
End;
if (infoV.tagName = 'SPAN') and (infoV.className = 'browseInfoList') then
ListItem.SubItems.Add(infoV.innerText);
if (infoV.tagName = 'SPAN') and (infoV.className = 'browseSeeds') then
ListItem.SubItems.Add(infoV.innerText);
End;
Listview1.Items.EndUpdate;
end;
If i add
Var
LinkV : IHTMLElement;
(...)
LinkV := Documento.links.item('', I) as IHTMLElement;
ListItem.subitems.Add(LinkV.innerText);
Don't work too.
UPDATE:
So easy, i can't believe:
if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
ListItem.SubItems.Add(InfoV.getAttribute('href', 0));
TY!!!
ASKER
He gave the solution, it took me a while to understand the concept.
Ty Marco Gasi!
Ty Marco Gasi!
I'm sorry to not have helped you more, but I have gone away (to sleep!). I'm happy you solved your problem. I never used that, but AFAI it should had worked with
but for this you should have to use look only for tags.
Thanks for points and good luck with your project.
Marco
if (infoV.tagName = 'A') and (infoV.className = 'std-btn-small mright') then
ListItem.SubItems.Add(InfoV.href);
but for this you should have to use look only for tags.
Thanks for points and good luck with your project.
Marco