Solved

Problem with getting HTML with Indy

Posted on 2006-07-10
19
704 Views
Last Modified: 2010-04-05
I have a redirect problem here. google.com redirects me to google.ro as I am from Romania.
Tell me how could I get the content of google.com with idHTTP. If I set the handle redirects to true it works ok but I get the content of google.ro. I have another component that gets me the right content from google.com.

Tell me if it`s possible.
I need it ASAP.

THANKS,
0
Comment
Question by:crystyan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 9
19 Comments
 
LVL 28

Expert Comment

by:2266180
ID: 17071592
you should have a 'Google.com in English' link on your google.ro page (localized pages have this).
so instead of getting google.com, get http://www.google.com/ncr
AND keep the cookies.
0
 

Author Comment

by:crystyan
ID: 17071631
but, how do I set that redirect to false or true ?
I mean if I set it to true and go to microsoft.com I get wrong result. If I set it to false I get good results. but if it`s set to false I don`t get good results from google.com anymore.

thanks ciuly!
0
 
LVL 28

Expert Comment

by:2266180
ID: 17071683
hm.. can you post a small test-code?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:crystyan
ID: 17071711
 idHTTP := TidHTTP.Create(nil);
  idCookieManager := TIdCookieManager.Create(idHTTP);
  idAntiFreeze := TIdAntiFreeze.Create(idHTTP);

  idHTTP.CookieManager := idCookieManager;
  idHTTP.AllowCookies := True;
  idHTTP.HandleRedirects := False;
  Cookies := TStringList.Create;
  HTML := idHTTP.Get(url);
  showmessage(html);
  GetCookies;
  ShowMessage(cookies.Text);

this is a function (well it`s a class but I cut the code and put it together)

  site.GetHTML('http://www.microsoft.com/');


basicly I want to make a class to get or post html, handle the redirects and maybe the cookies.
0
 
LVL 28

Expert Comment

by:2266180
ID: 17071802
looks ok except the redirect part. you should enable it while you get the cookies (didn't test if it works ok)

here is a small test-code I just wrote:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, IdCookieManager, IdBaseComponent, IdComponent,
  IdTCPConnection, IdTCPClient, IdHTTP;

type
  TForm1 = class(TForm)
    IdHTTP1: TIdHTTP;
    IdCookieManager1: TIdCookieManager;
    Memo1: TMemo;
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
var s:string;
    i:integer;
    cookies:tstringlist;

 procedure setcookies;
 var j:integer;
 begin
   for j:=1 to IdCookieManager1.CookieCollection.count do
     IdHTTP1.Request.RawHeaders.Add('Cookie'+IdHTTP1.Request.RawHeaders.NameValueSeparator+IdCookieManager1.CookieCollection.Items[j-1].CookieText);
 end;

begin
  try
    cookies:=tstringlist.Create;

    s:=IdHTTP1.Get('http://www.google.com/ncr');// first get (for cookies)

    for i:=1 to IdCookieManager1.CookieCollection.count do// save cookies
      cookies.Add(IdCookieManager1.CookieCollection.Items[i-1].CookieText);

    s:=IdHTTP1.Get('http://www.google.com');// normally work with google.ocm from now on
    showmessage(s);
    cookies.free;
  except on e: EIdHTTPProtocolException do
    begin
      showmessage(idHTTP1.response.ResponseText);
    end;
  end;
end;

end.

if you want to make a generic class, you will need to write a mini-webbrowser and follow http protocol. (and some html sinc ethere can be software redirects, from scripts)

I usually prefer to do my site-specific coding, site-specific :) I am not saying that it cannot be done a generic class, just that it is too hard and for me it doesn't worth it.

but usually, handleredirects = true should work ok for most sites, but for google it's a specific case, since it is you that you want to work with google.com and thus overriding the redirect ;) (no browser does that :) )
0
 

Author Comment

by:crystyan
ID: 17072158
do u know why do I get "IO HANDLER VALUE IS INVALID" when I`m trying to get the html from 'https://login.yahoo.com/config/login/' ?

thanks
0
 
LVL 28

Accepted Solution

by:
2266180 earned 250 total points
ID: 17072612
well .. if you look at the protocol, its https so it requires SSL. you will need to add SSL support to your application if you want to access that page.
I;ve done a login example with ssl for ebay here: http://www.ciuly.com/delphi/indy/delphiIndySSL_ebay/index.html
0
 

Author Comment

by:crystyan
ID: 17077570
hi ciuly,

I`m still having probs with the login at del.icio.us ! :(( I`ve spent all my day to look on the ebay project (u did that for me too). I was hoping you to have time to see what`s happening there.
I`m trying to do this:
  HTML := idHTTP.Get('http://del.icio.us/');
  GetCookies;
  SetCookies;
  HTML := idHTTP.Get('https://secure.del.icio.us/login');
and here I get the "IOHandler value is Invalid'.

thanks!
0
 
LVL 28

Expert Comment

by:2266180
ID: 17077601
I'll check it in about 10-12 hours. btw, I don't see you get any cookies from https://secure.del.icio.us/login . I would first make sure that it doesn't set any. have you checked that?
if still not working, I'll give it.
0
 

Author Comment

by:crystyan
ID: 17077630
nope. I just can`t the content of https://secure.del.icio.us/login . I`ve tried all the possibilities...except the good one lol.
0
 
LVL 28

Expert Comment

by:2266180
ID: 17077655
well, in this case I'll get back to you in about 10-12 hours. probably with the good solution :)
0
 

Author Comment

by:crystyan
ID: 17077663
thanks a lot!
0
 
LVL 28

Expert Comment

by:2266180
ID: 17079687
hm.. this one was short. you probably didn't notice the software redirect?

here is the code:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, IdBaseComponent, IdComponent, IdTCPConnection, IdTCPClient,
  IdHTTP, IdCookieManager, StdCtrls, IdServerIOHandler, IdSSLOpenSSL,
  IdIOHandler, IdIOHandlerSocket;

type
  TForm1 = class(TForm)
    IdHTTP1: TIdHTTP;
    IdCookieManager1: TIdCookieManager;
    Memo1: TMemo;
    IdSSLIOHandlerSocket1: TIdSSLIOHandlerSocket;
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
var
  Params: TStringList;
  HTML, loginurl, signinurl, userid: String;
  count,i:integer;
  cookies:tstringlist;

   procedure setcookies;
   var j:integer;
   begin
       count:=IdCookieManager1.CookieCollection.count;
       for j:=1 to count do
           IdHTTP1.Request.RawHeaders.Add('Cookie'+IdHTTP1.Request.RawHeaders.NameValueSeparator+IdCookieManager1.CookieCollection.Items[j-1].CookieText);
   end;

begin
  signinurl:='http://del.icio.us/';
  // the above is used to get the login page (this is the link from the "sign in" link.
  // you have to emulate a browser, so you need to do all steps. this is a good idea to do
  // since all redirects might set cookies that you will probably need

  loginurl:='https://secure.del.icio.us/login';
  // the above is the login url. this is the url from the action property of the form; this is where
  // the login request will be sent

  Params := TStringList.Create;
  try
    cookies:=tstringlist.Create;

    html:=idhttp1.Get(signinurl);// first get; get first cookie(s)
    // this sets 1 cookie

    count:=IdCookieManager1.CookieCollection.count;// get them
    for i:=1 to count do
     cookies.Add(IdCookieManager1.CookieCollection.Items[i-1].CookieText);

    // you might want to parse the hidden inputs name and value
    // because hard-coding them might not work in the future or in case there are
    // values that are generated

    // no hidden inputs at this time

    userid:=<your user id here>;
    Params.Values['user_name'] := userid;
    Params.Values['password'] := <your password here>;

    setCookies;
    HTML := IdHTTP1.Post(loginurl, Params);// now do the log in

//    if pos('<meta http-equiv="refresh" content="0; URL=http://del.icio.us/'+userid+'"',html)
    setCookies;
    html:=idhttp1.Get('http://del.icio.us/'+userid);// software redirect

    if pos('<title>del.icio.us/'+userid+'</title>',html)>0 then
    begin  // we are logged in
      showmessage('logged in');
    end               else
      showmessage('login failed');

  except
    on e: EIdHTTPProtocolException do
    begin
      memo1.lines.add(idHTTP1.response.ResponseText);
      memo1.lines.add(e.ErrorMessage);
    end;
  end;
  Params.Free;
  memo1.Lines.Text:=html;
end;

end.

works like a charm (I modified the ebay demo)

just in case you didn't know this, you should read this: http://www.indyproject.org/Sockets/SSL.en.aspx (I also updated my ebay demo page to point this out)

cheers
0
 

Author Comment

by:crystyan
ID: 17081983
lol .... I didn`t associate the SSL Handler to IdHttp. me dumb again!
0
 

Author Comment

by:crystyan
ID: 17084038
something is still weird here :(((((((((((

I`m doing this:
    HTML := idHTTP.Get('http://del.icio.us/');
    for i:=1 to IdCookieManager.CookieCollection.count do
     cookies.Add(IdCookieManager.CookieCollection.Items[i-1].CookieText);
     ShowMessage(cookies.Text);

and I can`t get all the cookies! though it said I`m connected, I don`t have all the cookies and when I`m tring to do something it redirects me to the login page :(((
I`ve looked with a sniffer and saw there are more cookies than I get.

do u have any ideea ?
0
 
LVL 28

Expert Comment

by:2266180
ID: 17085543
yes. some sites hide the cookies in resources to make sure bots don't get thre. since robotx/crawlers will mostly never load resources (images, sounds, etc) those cookies will not be set. check with the sniffer exactly which resource is setting teh cookies and load it yourself
0
 

Author Comment

by:crystyan
ID: 17085704
how do I know who sets a cookie ?
0
 
LVL 28

Expert Comment

by:2266180
ID: 17086130
I just told you: "check with the sniffer exactly which resource is setting teh cookies and load it yourself"
each resource will be loaded with a different http get command so it should be easy to spot
0
 

Author Comment

by:crystyan
ID: 17095104
could u try looking on my other question ? plssssssssssss
I know that I~m being a pain here :|

Thanks
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question