Solved

Problem with getting HTML with Indy

Posted on 2006-07-10
19
677 Views
Last Modified: 2010-04-05
I have a redirect problem here. google.com redirects me to google.ro as I am from Romania.
Tell me how could I get the content of google.com with idHTTP. If I set the handle redirects to true it works ok but I get the content of google.ro. I have another component that gets me the right content from google.com.

Tell me if it`s possible.
I need it ASAP.

THANKS,
0
Comment
Question by:crystyan
  • 10
  • 9
19 Comments
 
LVL 28

Expert Comment

by:ciuly
ID: 17071592
you should have a 'Google.com in English' link on your google.ro page (localized pages have this).
so instead of getting google.com, get http://www.google.com/ncr
AND keep the cookies.
0
 

Author Comment

by:crystyan
ID: 17071631
but, how do I set that redirect to false or true ?
I mean if I set it to true and go to microsoft.com I get wrong result. If I set it to false I get good results. but if it`s set to false I don`t get good results from google.com anymore.

thanks ciuly!
0
 
LVL 28

Expert Comment

by:ciuly
ID: 17071683
hm.. can you post a small test-code?
0
 

Author Comment

by:crystyan
ID: 17071711
 idHTTP := TidHTTP.Create(nil);
  idCookieManager := TIdCookieManager.Create(idHTTP);
  idAntiFreeze := TIdAntiFreeze.Create(idHTTP);

  idHTTP.CookieManager := idCookieManager;
  idHTTP.AllowCookies := True;
  idHTTP.HandleRedirects := False;
  Cookies := TStringList.Create;
  HTML := idHTTP.Get(url);
  showmessage(html);
  GetCookies;
  ShowMessage(cookies.Text);

this is a function (well it`s a class but I cut the code and put it together)

  site.GetHTML('http://www.microsoft.com/');


basicly I want to make a class to get or post html, handle the redirects and maybe the cookies.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 17071802
looks ok except the redirect part. you should enable it while you get the cookies (didn't test if it works ok)

here is a small test-code I just wrote:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, IdCookieManager, IdBaseComponent, IdComponent,
  IdTCPConnection, IdTCPClient, IdHTTP;

type
  TForm1 = class(TForm)
    IdHTTP1: TIdHTTP;
    IdCookieManager1: TIdCookieManager;
    Memo1: TMemo;
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
var s:string;
    i:integer;
    cookies:tstringlist;

 procedure setcookies;
 var j:integer;
 begin
   for j:=1 to IdCookieManager1.CookieCollection.count do
     IdHTTP1.Request.RawHeaders.Add('Cookie'+IdHTTP1.Request.RawHeaders.NameValueSeparator+IdCookieManager1.CookieCollection.Items[j-1].CookieText);
 end;

begin
  try
    cookies:=tstringlist.Create;

    s:=IdHTTP1.Get('http://www.google.com/ncr');// first get (for cookies)

    for i:=1 to IdCookieManager1.CookieCollection.count do// save cookies
      cookies.Add(IdCookieManager1.CookieCollection.Items[i-1].CookieText);

    s:=IdHTTP1.Get('http://www.google.com');// normally work with google.ocm from now on
    showmessage(s);
    cookies.free;
  except on e: EIdHTTPProtocolException do
    begin
      showmessage(idHTTP1.response.ResponseText);
    end;
  end;
end;

end.

if you want to make a generic class, you will need to write a mini-webbrowser and follow http protocol. (and some html sinc ethere can be software redirects, from scripts)

I usually prefer to do my site-specific coding, site-specific :) I am not saying that it cannot be done a generic class, just that it is too hard and for me it doesn't worth it.

but usually, handleredirects = true should work ok for most sites, but for google it's a specific case, since it is you that you want to work with google.com and thus overriding the redirect ;) (no browser does that :) )
0
 

Author Comment

by:crystyan
ID: 17072158
do u know why do I get "IO HANDLER VALUE IS INVALID" when I`m trying to get the html from 'https://login.yahoo.com/config/login/' ?

thanks
0
 
LVL 28

Accepted Solution

by:
ciuly earned 250 total points
ID: 17072612
well .. if you look at the protocol, its https so it requires SSL. you will need to add SSL support to your application if you want to access that page.
I;ve done a login example with ssl for ebay here: http://www.ciuly.com/delphi/indy/delphiIndySSL_ebay/index.html
0
 

Author Comment

by:crystyan
ID: 17077570
hi ciuly,

I`m still having probs with the login at del.icio.us ! :(( I`ve spent all my day to look on the ebay project (u did that for me too). I was hoping you to have time to see what`s happening there.
I`m trying to do this:
  HTML := idHTTP.Get('http://del.icio.us/');
  GetCookies;
  SetCookies;
  HTML := idHTTP.Get('https://secure.del.icio.us/login');
and here I get the "IOHandler value is Invalid'.

thanks!
0
 
LVL 28

Expert Comment

by:ciuly
ID: 17077601
I'll check it in about 10-12 hours. btw, I don't see you get any cookies from https://secure.del.icio.us/login . I would first make sure that it doesn't set any. have you checked that?
if still not working, I'll give it.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:crystyan
ID: 17077630
nope. I just can`t the content of https://secure.del.icio.us/login . I`ve tried all the possibilities...except the good one lol.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 17077655
well, in this case I'll get back to you in about 10-12 hours. probably with the good solution :)
0
 

Author Comment

by:crystyan
ID: 17077663
thanks a lot!
0
 
LVL 28

Expert Comment

by:ciuly
ID: 17079687
hm.. this one was short. you probably didn't notice the software redirect?

here is the code:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, IdBaseComponent, IdComponent, IdTCPConnection, IdTCPClient,
  IdHTTP, IdCookieManager, StdCtrls, IdServerIOHandler, IdSSLOpenSSL,
  IdIOHandler, IdIOHandlerSocket;

type
  TForm1 = class(TForm)
    IdHTTP1: TIdHTTP;
    IdCookieManager1: TIdCookieManager;
    Memo1: TMemo;
    IdSSLIOHandlerSocket1: TIdSSLIOHandlerSocket;
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
var
  Params: TStringList;
  HTML, loginurl, signinurl, userid: String;
  count,i:integer;
  cookies:tstringlist;

   procedure setcookies;
   var j:integer;
   begin
       count:=IdCookieManager1.CookieCollection.count;
       for j:=1 to count do
           IdHTTP1.Request.RawHeaders.Add('Cookie'+IdHTTP1.Request.RawHeaders.NameValueSeparator+IdCookieManager1.CookieCollection.Items[j-1].CookieText);
   end;

begin
  signinurl:='http://del.icio.us/';
  // the above is used to get the login page (this is the link from the "sign in" link.
  // you have to emulate a browser, so you need to do all steps. this is a good idea to do
  // since all redirects might set cookies that you will probably need

  loginurl:='https://secure.del.icio.us/login';
  // the above is the login url. this is the url from the action property of the form; this is where
  // the login request will be sent

  Params := TStringList.Create;
  try
    cookies:=tstringlist.Create;

    html:=idhttp1.Get(signinurl);// first get; get first cookie(s)
    // this sets 1 cookie

    count:=IdCookieManager1.CookieCollection.count;// get them
    for i:=1 to count do
     cookies.Add(IdCookieManager1.CookieCollection.Items[i-1].CookieText);

    // you might want to parse the hidden inputs name and value
    // because hard-coding them might not work in the future or in case there are
    // values that are generated

    // no hidden inputs at this time

    userid:=<your user id here>;
    Params.Values['user_name'] := userid;
    Params.Values['password'] := <your password here>;

    setCookies;
    HTML := IdHTTP1.Post(loginurl, Params);// now do the log in

//    if pos('<meta http-equiv="refresh" content="0; URL=http://del.icio.us/'+userid+'"',html)
    setCookies;
    html:=idhttp1.Get('http://del.icio.us/'+userid);// software redirect

    if pos('<title>del.icio.us/'+userid+'</title>',html)>0 then
    begin  // we are logged in
      showmessage('logged in');
    end               else
      showmessage('login failed');

  except
    on e: EIdHTTPProtocolException do
    begin
      memo1.lines.add(idHTTP1.response.ResponseText);
      memo1.lines.add(e.ErrorMessage);
    end;
  end;
  Params.Free;
  memo1.Lines.Text:=html;
end;

end.

works like a charm (I modified the ebay demo)

just in case you didn't know this, you should read this: http://www.indyproject.org/Sockets/SSL.en.aspx (I also updated my ebay demo page to point this out)

cheers
0
 

Author Comment

by:crystyan
ID: 17081983
lol .... I didn`t associate the SSL Handler to IdHttp. me dumb again!
0
 

Author Comment

by:crystyan
ID: 17084038
something is still weird here :(((((((((((

I`m doing this:
    HTML := idHTTP.Get('http://del.icio.us/');
    for i:=1 to IdCookieManager.CookieCollection.count do
     cookies.Add(IdCookieManager.CookieCollection.Items[i-1].CookieText);
     ShowMessage(cookies.Text);

and I can`t get all the cookies! though it said I`m connected, I don`t have all the cookies and when I`m tring to do something it redirects me to the login page :(((
I`ve looked with a sniffer and saw there are more cookies than I get.

do u have any ideea ?
0
 
LVL 28

Expert Comment

by:ciuly
ID: 17085543
yes. some sites hide the cookies in resources to make sure bots don't get thre. since robotx/crawlers will mostly never load resources (images, sounds, etc) those cookies will not be set. check with the sniffer exactly which resource is setting teh cookies and load it yourself
0
 

Author Comment

by:crystyan
ID: 17085704
how do I know who sets a cookie ?
0
 
LVL 28

Expert Comment

by:ciuly
ID: 17086130
I just told you: "check with the sniffer exactly which resource is setting teh cookies and load it yourself"
each resource will be loaded with a different http get command so it should be easy to spot
0
 

Author Comment

by:crystyan
ID: 17095104
could u try looking on my other question ? plssssssssssss
I know that I~m being a pain here :|

Thanks
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now