[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1127
  • Last Modified:

Retriving Imdb Info

Im developing a tool for my video store and i want to retrieve cast , plot info , year , genre , votes , title , writen and directed by from a imdb url like : http://www.imdb.com/title/tt0145487/ ... thnx a lot
0
andre_carnevale
Asked:
andre_carnevale
  • 12
  • 7
1 Solution
 
Eddie ShipmanAll-around developerCommented:
You are asking a lot. It will take some time to parse the results from imdb pages
because they don't use a standardized format. Their html is junk. They use tables
for layout instead of CSS. I can do it for you but it will take some time or I can start you
off and you can work on the parsing yourself.
0
 
andre_carnevaleAuthor Commented:
uhmmm if i can just have an ideia it would be great
0
 
Eddie ShipmanAll-around developerCommented:
Got a question. How are you going to be getting the URL's like the one above?
I am working on a utility to search using their Tsearch facility By Title) and am
having problems figuring something out.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
Eddie ShipmanAll-around developerCommented:
OK, here's something to get you started. I used a component called extIEParser, available from the
Delphi-WebBrowser Yahoo group Files section, not sure if you have to be a member to download it
but if you aren't let me know via email and I'll send it to you.

The edit is the title to search for, I am the catcher in the oil field scene in "The Rookie", BTW.
Enter the title, click GO and then in the listbox, click the one you want to view.
Now I'm leaving the parsing of the results to you. I have an example where I modify the
'(more)' to 'Plot Summary' so you can see how to do it. If you have any more questions, let me know
and good luck...

DFM:
object Form1: TForm1
  Left = 316
  Top = 143
  Width = 870
  Height = 640
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  OnDestroy = FormDestroy
  PixelsPerInch = 96
  TextHeight = 13
  object ListBox1: TListBox
    Left = 0
    Top = 41
    Width = 288
    Height = 572
    Align = alLeft
    ItemHeight = 13
    TabOrder = 0
    OnClick = ListBox1Click
    OnMouseMove = ListBox1MouseMove
  end
  object Panel1: TPanel
    Left = 0
    Top = 0
    Width = 862
    Height = 41
    Align = alTop
    BevelOuter = bvNone
    TabOrder = 1
    object Edit1: TEdit
      Left = 16
      Top = 10
      Width = 201
      Height = 21
      TabOrder = 0
      Text = 'The Rookie'
    end
    object Button1: TButton
      Left = 232
      Top = 8
      Width = 75
      Height = 25
      Caption = 'GO'
      Default = True
      TabOrder = 1
      OnClick = Button1Click
    end
  end
  object Panel2: TPanel
    Left = 288
    Top = 41
    Width = 574
    Height = 572
    Align = alClient
    BevelInner = bvRaised
    BevelOuter = bvLowered
    Caption = 'Panel2'
    TabOrder = 2
    object WebBrowser1: TWebBrowser
      Left = 2
      Top = 2
      Width = 570
      Height = 568
      Align = alClient
      TabOrder = 0
      ControlData = {
        4C000000E93A0000B43A00000000000000000000000000000000000000000000
        000000004C000000000000000000000001000000E0D057007335CF11AE690800
        2B2E126208000000000000004C0000000114020000000000C000000000000046
        8000000000000000000000000000000000000000000000000000000000000000
        00000000000000000100000000000000000000000000000000000000}
    end
  end
  object IdHTTP1: TIdHTTP
    MaxLineAction = maException
    AllowCookies = True
    ProxyParams.BasicAuthentication = False
    ProxyParams.ProxyPort = 0
    Request.ContentLength = -1
    Request.ContentRangeEnd = 0
    Request.ContentRangeStart = 0
    Request.Accept = 'text/html, */*'
    Request.BasicAuthentication = False
    Request.UserAgent =
      'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.' +
      '4322)'
    HTTPOptions = [hoForceEncodeParams]
    Left = 776
    Top = 8
  end
  object extIEParser1: TextIEParser
    DownloadOnly = False
    DownloadOptions = [DLCTL_DOWNLOADONLY, DLCTL_NO_FRAMEDOWNLOAD, DLCTL_RESYNCHRONIZE, DLCTL_PRAGMA_NO_CACHE, DLCTL_NO_BEHAVIORS, DLCTL_OFFLINE]
    parseNoFrames = False
    OnAnchor = extIEParser1Anchor
    Left = 816
    Top = 8
  end
end

PAS:
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, IdBaseComponent, IdComponent, IdTCPConnection,
  IdTCPClient, IdHTTP, OleCtrls, SHDocVw, extIEParser, ExtCtrls,Clipbrd ;

type
  TForm1 = class(TForm)
    IdHTTP1: TIdHTTP;
    extIEParser1: TextIEParser;
    ListBox1: TListBox;
    Panel1: TPanel;
    Edit1: TEdit;
    Button1: TButton;
    Panel2: TPanel;
    WebBrowser1: TWebBrowser;
    procedure Button1Click(Sender: TObject);
    procedure extIEParser1Anchor(Sender: TObject; href, target, rel, rev,
      urn, Methods, name, host, hostname, pathname, port, protocol, search,
      hash, accesskey, protocolLong, mimeType, nameProp: String;
      Element: TElementInfo);
    procedure FormDestroy(Sender: TObject);
    procedure FormCreate(Sender: TObject);
    procedure ListBox1Click(Sender: TObject);
    procedure ListBox1MouseMove(Sender: TObject; Shift: TShiftState; X,
      Y: Integer);
  private
    { Private declarations }
  public
    { Public declarations }
    slAnchorHrefs: TStringList;
  end;

var
  Form1: TForm1;

implementation

uses Unit2;

{$R *.dfm}

procedure TForm1.Button1Click(Sender: TObject);
var
  src: TStringStream;
  x: Integer;
begin
  ListBox1.Items.Clear;
  src := TStringStream.Create('');
  WebBrowser1.Navigate('http://www.imdb.com/Tsearch?title='+ Edit1.Text+'&restrict=Movies+only');
  extIEParser1.Url := 'http://www.imdb.com/Tsearch?title='+ Edit1.Text+'&restrict=Movies+only';
  extIEParser1.Execute;
  for x := 0 to ListBox1.Items.Count-1 do
    if Pos(Edit1.Text, ListBox1.Items.Names[x]) > 0 then
      ShowMessage(ListBox1.Items.Values[ListBox1.Items.Names[x]]);
end;

procedure TForm1.extIEParser1Anchor(Sender: TObject; href, target, rel,
  rev, urn, Methods, name, host, hostname, pathname, port, protocol,
  search, hash, accesskey, protocolLong, mimeType, nameProp: String;
  Element: TElementInfo);
var
  LinkText: String;
begin
  if Trim(Element.outerText) = '' then
    LinkText := 'Unassigned'
  else
    LinkText := Element.outerText;
  if LinkText <> 'Unassigned' then
  begin
    if (LinkText = '(more)') and (Pos('plotsummary', href)> 0) then
      LinkText := 'Plot Summary';
    slAnchorHrefs.Add(href);
    ListBox1.Items.Add(LinkText);
  end;
end;

procedure TForm1.FormDestroy(Sender: TObject);
begin
  slAnchorHrefs.Free;
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
  slAnchorHrefs := TStringList.Create;
end;

procedure TForm1.ListBox1Click(Sender: TObject);
begin
  WebBrowser1.Navigate(slAnchorHrefs[ListBox1.ItemIndex]);
  extIEParser1.Url := slAnchorHrefs[ListBox1.ItemIndex];
  slAnchorHrefs.Clear;
  ListBox1.Items.Clear;
  extIEParser1.Execute;
end;

procedure TForm1.ListBox1MouseMove(Sender: TObject; Shift: TShiftState; X,
  Y: Integer);
var
   APoint : TPoint ;
   Index : Integer ;
   HW : THintWindow ;
   Rec : TRect ;
   sHint : String ;
begin
  APoint.X := X;
  APoint.Y := Y;

  Index := ListBox1.ItemAtPos(APoint, True);
  if Index >= 0 then
  begin
     HW := THintWindow.Create(nil);
     try
       GetCursorPos(APoint);
        sHint := slAnchorHrefs[Index];
        Rec.Top := APoint.Y + 20 ;
        Rec.Left := APoint.X ;
        Rec.Right := Rec.Left +
           HW.Canvas.TextWidth(sHint)+6;
        Rec.Bottom := Rec.Top +
           HW.Canvas.TextHeight(sHint)+4;
        HW.ActivateHint(Rec,sHint);
        HW.Refresh ;
        Sleep(400);
        HW.ReleaseHandle ;
     finally
        HW.Free ;
     end;
  end;
end;

end.
0
 
Eddie ShipmanAll-around developerCommented:
BTW, MAKE SURE you modify the Request.UserAgent for idHTTP as shown above because otherwise,
IMDB will give you a 403 Forbidden error. Seems too many peple have been accessing it using Indy
components...HA! HA!
0
 
andre_carnevaleAuthor Commented:
where is extIEParser1: TextIEParser; located at delphi ??? if dont have where can i download it ???? thnx ... btw i will aceppt your ansmwer
0
 
Eddie ShipmanAll-around developerCommented:
GO to http://groups.yahoo.com/groups/delphi-webbrowser
Like I said, I'm not sure if you have to be a member to download it
but if you aren't let me know via email and I'll send it to you.

0
 
andre_carnevaleAuthor Commented:
yeah i have to be a member to get it ;/
send it please to my emails
fuketa@terra.com.br
thank you veru much
0
 
Eddie ShipmanAll-around developerCommented:
On its way...
0
 
andre_carnevaleAuthor Commented:
there is no way to compile it on delphi 6 ???
0
 
Eddie ShipmanAll-around developerCommented:
That is what I used, D6. what problem are you having?
0
 
andre_carnevaleAuthor Commented:
say that i dont have this files
[Fatal Error] extIEParser.pas(17): File not found: 'mshtml_tlb.dcu'
0
 
Eddie ShipmanAll-around developerCommented:
Change mshtml_tlb in the uses clause to  to mshtml. It should then compile.
0
 
andre_carnevaleAuthor Commented:
one problem solved other appear
[Fatal Error] extIEParserReg.pas(18): File not found: 'dsgnintf.dcu'
0
 
Eddie ShipmanAll-around developerCommented:
I will send you the extIEParserReg file I have...
It has installed fine on D6.

I added some events but did not finish the implementation,
OnTable, OnTR, OnTD.
The OnTable is good, the other two I have not finished.
0
 
andre_carnevaleAuthor Commented:
tried to compile it and my delphi crashed again ... had to install it again (second time) ... tried to compile then it crash and delphi dont open anymore ;( .. know what can be causing this ?
0
 
Eddie ShipmanAll-around developerCommented:
Don't know, I had no problem installing it on D6 Pro.
Go ahead and import the MSHTML Type lib and change the uses clause back to mshtml2_tlb
and see if that helps.
0
 
Eddie ShipmanAll-around developerCommented:
I may just have to install it into a bpl of it's own and send it to you.
0
 
Eddie ShipmanAll-around developerCommented:
OK, I have the package ready for you, email me at
<< eMail removed

kretzschmar
Page Editor
>>
an I'll
send it to you.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 12
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now