Solved

Problem with getting HTML with Indy

Posted on 2006-07-10
19
693 Views
Last Modified: 2010-04-05
I have a redirect problem here. google.com redirects me to google.ro as I am from Romania.
Tell me how could I get the content of google.com with idHTTP. If I set the handle redirects to true it works ok but I get the content of google.ro. I have another component that gets me the right content from google.com.

Tell me if it`s possible.
I need it ASAP.

THANKS,
0
Comment
Question by:crystyan
  • 10
  • 9
19 Comments
 
LVL 28

Expert Comment

by:2266180
ID: 17071592
you should have a 'Google.com in English' link on your google.ro page (localized pages have this).
so instead of getting google.com, get http://www.google.com/ncr
AND keep the cookies.
0
 

Author Comment

by:crystyan
ID: 17071631
but, how do I set that redirect to false or true ?
I mean if I set it to true and go to microsoft.com I get wrong result. If I set it to false I get good results. but if it`s set to false I don`t get good results from google.com anymore.

thanks ciuly!
0
 
LVL 28

Expert Comment

by:2266180
ID: 17071683
hm.. can you post a small test-code?
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 

Author Comment

by:crystyan
ID: 17071711
 idHTTP := TidHTTP.Create(nil);
  idCookieManager := TIdCookieManager.Create(idHTTP);
  idAntiFreeze := TIdAntiFreeze.Create(idHTTP);

  idHTTP.CookieManager := idCookieManager;
  idHTTP.AllowCookies := True;
  idHTTP.HandleRedirects := False;
  Cookies := TStringList.Create;
  HTML := idHTTP.Get(url);
  showmessage(html);
  GetCookies;
  ShowMessage(cookies.Text);

this is a function (well it`s a class but I cut the code and put it together)

  site.GetHTML('http://www.microsoft.com/');


basicly I want to make a class to get or post html, handle the redirects and maybe the cookies.
0
 
LVL 28

Expert Comment

by:2266180
ID: 17071802
looks ok except the redirect part. you should enable it while you get the cookies (didn't test if it works ok)

here is a small test-code I just wrote:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, IdCookieManager, IdBaseComponent, IdComponent,
  IdTCPConnection, IdTCPClient, IdHTTP;

type
  TForm1 = class(TForm)
    IdHTTP1: TIdHTTP;
    IdCookieManager1: TIdCookieManager;
    Memo1: TMemo;
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
var s:string;
    i:integer;
    cookies:tstringlist;

 procedure setcookies;
 var j:integer;
 begin
   for j:=1 to IdCookieManager1.CookieCollection.count do
     IdHTTP1.Request.RawHeaders.Add('Cookie'+IdHTTP1.Request.RawHeaders.NameValueSeparator+IdCookieManager1.CookieCollection.Items[j-1].CookieText);
 end;

begin
  try
    cookies:=tstringlist.Create;

    s:=IdHTTP1.Get('http://www.google.com/ncr');// first get (for cookies)

    for i:=1 to IdCookieManager1.CookieCollection.count do// save cookies
      cookies.Add(IdCookieManager1.CookieCollection.Items[i-1].CookieText);

    s:=IdHTTP1.Get('http://www.google.com');// normally work with google.ocm from now on
    showmessage(s);
    cookies.free;
  except on e: EIdHTTPProtocolException do
    begin
      showmessage(idHTTP1.response.ResponseText);
    end;
  end;
end;

end.

if you want to make a generic class, you will need to write a mini-webbrowser and follow http protocol. (and some html sinc ethere can be software redirects, from scripts)

I usually prefer to do my site-specific coding, site-specific :) I am not saying that it cannot be done a generic class, just that it is too hard and for me it doesn't worth it.

but usually, handleredirects = true should work ok for most sites, but for google it's a specific case, since it is you that you want to work with google.com and thus overriding the redirect ;) (no browser does that :) )
0
 

Author Comment

by:crystyan
ID: 17072158
do u know why do I get "IO HANDLER VALUE IS INVALID" when I`m trying to get the html from 'https://login.yahoo.com/config/login/' ?

thanks
0
 
LVL 28

Accepted Solution

by:
2266180 earned 250 total points
ID: 17072612
well .. if you look at the protocol, its https so it requires SSL. you will need to add SSL support to your application if you want to access that page.
I;ve done a login example with ssl for ebay here: http://www.ciuly.com/delphi/indy/delphiIndySSL_ebay/index.html
0
 

Author Comment

by:crystyan
ID: 17077570
hi ciuly,

I`m still having probs with the login at del.icio.us ! :(( I`ve spent all my day to look on the ebay project (u did that for me too). I was hoping you to have time to see what`s happening there.
I`m trying to do this:
  HTML := idHTTP.Get('http://del.icio.us/');
  GetCookies;
  SetCookies;
  HTML := idHTTP.Get('https://secure.del.icio.us/login');
and here I get the "IOHandler value is Invalid'.

thanks!
0
 
LVL 28

Expert Comment

by:2266180
ID: 17077601
I'll check it in about 10-12 hours. btw, I don't see you get any cookies from https://secure.del.icio.us/login . I would first make sure that it doesn't set any. have you checked that?
if still not working, I'll give it.
0
 

Author Comment

by:crystyan
ID: 17077630
nope. I just can`t the content of https://secure.del.icio.us/login . I`ve tried all the possibilities...except the good one lol.
0
 
LVL 28

Expert Comment

by:2266180
ID: 17077655
well, in this case I'll get back to you in about 10-12 hours. probably with the good solution :)
0
 

Author Comment

by:crystyan
ID: 17077663
thanks a lot!
0
 
LVL 28

Expert Comment

by:2266180
ID: 17079687
hm.. this one was short. you probably didn't notice the software redirect?

here is the code:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, IdBaseComponent, IdComponent, IdTCPConnection, IdTCPClient,
  IdHTTP, IdCookieManager, StdCtrls, IdServerIOHandler, IdSSLOpenSSL,
  IdIOHandler, IdIOHandlerSocket;

type
  TForm1 = class(TForm)
    IdHTTP1: TIdHTTP;
    IdCookieManager1: TIdCookieManager;
    Memo1: TMemo;
    IdSSLIOHandlerSocket1: TIdSSLIOHandlerSocket;
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
var
  Params: TStringList;
  HTML, loginurl, signinurl, userid: String;
  count,i:integer;
  cookies:tstringlist;

   procedure setcookies;
   var j:integer;
   begin
       count:=IdCookieManager1.CookieCollection.count;
       for j:=1 to count do
           IdHTTP1.Request.RawHeaders.Add('Cookie'+IdHTTP1.Request.RawHeaders.NameValueSeparator+IdCookieManager1.CookieCollection.Items[j-1].CookieText);
   end;

begin
  signinurl:='http://del.icio.us/';
  // the above is used to get the login page (this is the link from the "sign in" link.
  // you have to emulate a browser, so you need to do all steps. this is a good idea to do
  // since all redirects might set cookies that you will probably need

  loginurl:='https://secure.del.icio.us/login';
  // the above is the login url. this is the url from the action property of the form; this is where
  // the login request will be sent

  Params := TStringList.Create;
  try
    cookies:=tstringlist.Create;

    html:=idhttp1.Get(signinurl);// first get; get first cookie(s)
    // this sets 1 cookie

    count:=IdCookieManager1.CookieCollection.count;// get them
    for i:=1 to count do
     cookies.Add(IdCookieManager1.CookieCollection.Items[i-1].CookieText);

    // you might want to parse the hidden inputs name and value
    // because hard-coding them might not work in the future or in case there are
    // values that are generated

    // no hidden inputs at this time

    userid:=<your user id here>;
    Params.Values['user_name'] := userid;
    Params.Values['password'] := <your password here>;

    setCookies;
    HTML := IdHTTP1.Post(loginurl, Params);// now do the log in

//    if pos('<meta http-equiv="refresh" content="0; URL=http://del.icio.us/'+userid+'"',html)
    setCookies;
    html:=idhttp1.Get('http://del.icio.us/'+userid);// software redirect

    if pos('<title>del.icio.us/'+userid+'</title>',html)>0 then
    begin  // we are logged in
      showmessage('logged in');
    end               else
      showmessage('login failed');

  except
    on e: EIdHTTPProtocolException do
    begin
      memo1.lines.add(idHTTP1.response.ResponseText);
      memo1.lines.add(e.ErrorMessage);
    end;
  end;
  Params.Free;
  memo1.Lines.Text:=html;
end;

end.

works like a charm (I modified the ebay demo)

just in case you didn't know this, you should read this: http://www.indyproject.org/Sockets/SSL.en.aspx (I also updated my ebay demo page to point this out)

cheers
0
 

Author Comment

by:crystyan
ID: 17081983
lol .... I didn`t associate the SSL Handler to IdHttp. me dumb again!
0
 

Author Comment

by:crystyan
ID: 17084038
something is still weird here :(((((((((((

I`m doing this:
    HTML := idHTTP.Get('http://del.icio.us/');
    for i:=1 to IdCookieManager.CookieCollection.count do
     cookies.Add(IdCookieManager.CookieCollection.Items[i-1].CookieText);
     ShowMessage(cookies.Text);

and I can`t get all the cookies! though it said I`m connected, I don`t have all the cookies and when I`m tring to do something it redirects me to the login page :(((
I`ve looked with a sniffer and saw there are more cookies than I get.

do u have any ideea ?
0
 
LVL 28

Expert Comment

by:2266180
ID: 17085543
yes. some sites hide the cookies in resources to make sure bots don't get thre. since robotx/crawlers will mostly never load resources (images, sounds, etc) those cookies will not be set. check with the sniffer exactly which resource is setting teh cookies and load it yourself
0
 

Author Comment

by:crystyan
ID: 17085704
how do I know who sets a cookie ?
0
 
LVL 28

Expert Comment

by:2266180
ID: 17086130
I just told you: "check with the sniffer exactly which resource is setting teh cookies and load it yourself"
each resource will be loaded with a different http get command so it should be easy to spot
0
 

Author Comment

by:crystyan
ID: 17095104
could u try looking on my other question ? plssssssssssss
I know that I~m being a pain here :|

Thanks
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Machine not responding during CopyFile() 3 98
Printing problem 2 95
Reconfigure Delphi Install? 2 51
Convert a string into a TDateTime 5 61
A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question