Grabbing a Searchengine by forming a post query in perl

I'trying to query the media searchengine of www.amazon.de. the searchform (html) is pretty simple and looks like this:

<form method="post" action="http://www.amazon.de/exec/obidos/search-handle-form/028-4460036-3577007">

<select name="index" >
  <option value="books">Bücher
  <option value="us">US-Bücher
  <option value="music">Pop Musik
  <option value="classical">Klassik
</select>
<br>
<nobr>
<input type="text" name="field-keywords" size="13">
<input type="hidden" name="rank" value="+amzrank">
<input type="image" border=0 value="Go" name="Go" height="18" width="25" src="/g/portal/top-nav/portal-los.gif" align=absmiddle>
</form>

It runs perfectly from any location. GET is not supported

Trying to do exactly this using a perlscript to grab the results fails
the remote cgi (amazon) shows no matches for the correctly transported keyword....

the send() request in my perl script looks as follows:

send(GET,"POST $url HTTP/1.0\nAccept: application/vnd.ms-excel, application/msword, image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\nReferer: http://www.varengold.de\nAccept-Language: de\nContent-Type: application/x-www-form-urlencoded\nAccept-Encoding: gzip, deflate\nUser-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)\nHost: db2.ibs-gmbh.net\nContent-Length: $length\nPragma: No-Cache\n\n$content",0);

why can't the amazon.de searchengine not find any matches if queried by this script?????

oschleedeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

maneshrCommented:
use this and it will work..


#!/usr/local/bin/perl -I/home/webuser/manesh

require LWP;
require URI::URL;

use strict;
use CGI;

my($hdr,$server_response);
my($statement_URL)="http://www.amazon.de/exec/obidos/search-handle-form";

my($query)=new CGI;

## Use the following lines (18-21) when you are calling this program
## via YOUR HTML form.
#foreach($query->param()){
#  $hdr.=$_."=".$query->param($_)."&"; ## Join the CGI params using & delimiter
#}
#$hdr=~ s/(.*)&$/$1/;   ## Remove the trailing &

## Right now the values are hard-coded
$hdr="index=books&field-keywords=kampf&rank=+amzrank&Go.x=6&Go.y=6";

$server_response=&browse($statement_URL,$hdr,3);   ## Fire the URL

print "Content-type: text/html\n\n";

print "$server_response";

sub browse(){
   my($statement_URL,$hdr,$page_no)=@_;
   my($content_type,$method);

   $content_type="application/x-www-form-urlencoded";
   $method="POST";

   my($headers)= new HTTP::Headers
     'Content-Type'   =>  $content_type,
     'MIME-Version'   =>  '1.0',
     'Date'           =>  HTTP::Date::time2str(time),
     'Accept'         =>  'text/html';

   my($ua)= new LWP::UserAgent;

   $ua->agent("Mozilla/4.7 [en] (WinNT; U)"); # Define env variable - HTTP_USER
_AGENT

   my($url)= new URI::URL($statement_URL);
   my($request)= new HTTP::Request($method, $url, $headers,$hdr);

   my($response)= $ua->request($request);

   my($reply);

   if ($response->is_success){
      $reply=$response->content;
   }else{
      $reply=$response->error_as_HTML();
   }
   return $reply;
}
0
oschleedeAuthor Commented:
hhmmm.. i haven't got the LWP module on our server. I'll try to form the header the way you did, maybe i'll suceed. by the way: why do you think my script must fail? is it the header?

0
ercisCommented:
oschleede, use a portlisten or other prog to see, what request sends your browser of legal form, and then simply repeat all, except search string ...
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

oschleedeAuthor Commented:
can i install such a portscan prog an my machine and monitor my browser on the same machine and listen to the output or must i install it non the server to listen to the incomming things on port 80?
0
maneshrCommented:
from the code snippet that you have provided all that i can make out is that you need to pass the proper search engine params to the "amazon" script.

Also you HAVE to use the POST method to submit your request.

i would suggest that you download and install the LWP module from www.cpan.org. You will find LWP and the other related modules (Eg. WWW) very useful for the kind of work that you are doing.

in case you cannot install these modules in your system lib dir, you can install then in your own home dir and make the perl script read the module from there.

you can get these modules from......

http://www.cpan.org/modules/01modules.index.html

0
oschleedeAuthor Commented:
ok, since there is no way how to find a way to form a propper header i decided to install the libwww. it was some hazardous enterprise because i didn't and don't know wether the setting i've made are ok for the scripts already there. however, after anstalling your script it kept returning 302 errors. the errors where presented by your routine. the browser claimed, there's no data.

have you tested your script on the searchengine?
please let me know wether the 302 is some serverside protection against people like me or wether there's something i can do about in the script..

thank you.
0
maneshrCommented:
thats right i have tested my script and i do get the resulting result page too.

now after you downloaded the zipped/.gz file, did you run the install process?

if yes, let say you installed the module in /home/oschleede, you need to change ........

!/usr/local/bin/perl -I/home/webuser/manesh

to

!/usr/local/bin/perl -I/home/oschleede

so that when you use the module, perl know where to look for the .pm file.

in fact thats the reason i have -I/home/webuser/manesh since the www module is in my home directory.

Hope that answers your question.

Tschus
0
oschleedeAuthor Commented:
:)

ithanks for your advice, i've taken the liberty to install all that required mess allready, thats how i came about the 302 error in the query, without the lwpstuff perl would harldy proceeded that far :)

ok now .... lets say the server/perl is configured all right. then there ar only 2 possible sources of errors: the script itself and the security at the other site....

maby we should turn the cookiesupport explicitly off. i think the searchengine might have a problem whith me passing none of the plenty cookie info it uses tio spam me with..

do you happen to know such a switch?
or do you thin the cookies are not the point?

i'm keen on your answer
greetings from hamburg
0
oschleedeAuthor Commented:
:)

ithanks for your advice, i've taken the liberty to install all that required mess allready, thats how i came about the 302 error in the query, without the lwpstuff perl would harldy proceeded that far :)

ok now .... lets say the server/perl is configured all right. then there ar only 2 possible sources of errors: the script itself and the security at the other site....

maby we should turn the cookiesupport explicitly off. i think the searchengine might have a problem whith me passing none of the plenty cookie info it uses tio spam me with..

do you happen to know such a switch?
or do you thin the cookies are not the point?

i'm keen on your answer
greetings from hamburg
0
maneshrCommented:
when i tested using the code that i gave you earlier, i did not set any cookies.

i think the problem has to be because of the way your network is configured. Error 302 indicates that a re-direction is occuring.

i am not sure if this re-direction is due to some proxy/firewall settings in your network or something else.

in fact i can prove to you that this script is working by giving you access to a test machine from where i ran the script.

in case you wish to telnet to that machine and run the script,
send me an email at maneshr@hotmail.com



0
oschleedeAuthor Commented:
It would be a great help for me to see the script working. could you place it somewhere into a cgi-bin so that i can check it myself?

concerning the netconnect, i'm directly connected. no proxy.

concerning the cookies, you got me wrong: amazon.de sets them, session-id and stuff. maybe they require the session id somehow?

do you happen to know wether one can set something like cookie-support: no?
0
maneshrCommented:
unfortunately, there is no cgi-bin dir. on that system. i have the file in my home dir. that is the reason i would have to give you telnet access.

as far as session id goes, yes amazon.de does use a cookie and session id.

in fact in the original URL that you gave me the 028-4460036-3577007 is a session-id. in the script that i gave you i inserted the foll. line so
that i could get the exact cookie name etc.. from amazon.


 return $response->as_string;

i inserted the above between

my($response)= $ua->request($request);


my($reply);

once i had the exact headers that amazon sent me the rest was easy.

do let me know if you need telnet access to the test system


Haben Sie eine schonste wockenende.


Hope the german is fine:)
0
oschleedeAuthor Commented:
yes please! get grant me access to that testsys. i'll check it then....

eMail me, olli@o.cx
0
ercisCommented:
oschleede, u can use this URL:
 http://phpwizard.net/header/
instead of portlistener:
 http://www.ballcom.com/~timr/files/plisten.zip
to get all http headers
0
oschleedeAuthor Commented:
hhhmmm... maneshr

at last i think i'm stuck with it. the problem is that when entering an isbn no. into the searchfield of amazon, it automatically detects, that this searchterm is such a number. try it at amazon.de. it works perfectly.

the script you've written does produce the same 302 errors from time to time as mine. there must ....

Interesting developement! i'v been so stupid! in fact, i've forgotten to add the default 'Alles' with the value 'blended' to the dropdownlist!!!  but... :( there again: now it seems to loose the isbnno somewhere. amazon.de now claims 'Keine Suchergebnisse für'  (no results found) and nothing else.

and amazon.de copes perfectly with the isbn no whatever parameter is selected books, musik or whatsoever...

0
maneshrCommented:
pl give me the exact URL that you are trying to use for the ISBN search.

i think this is a different form that the one you had originally asked for.
0
oschleedeAuthor Commented:
www.amazon.de

FORM ACTION:
http://www.amazon.de/exec/obidos/search-handle-form/

GET Query:
index=blended&field-keywords=3897211270&rank=+amzrank&Go.x=1&Go.y=1
0
maneshrCommented:
Ok .heres the modified code.

you will have to combine this code with the one i had give to you earlier to
make both the "Bucher" & "Alles" to work together.

The difference between searching for Book name and isbn is that, ISBN search return another URL which has the actual page content. in case of book search the final page is returned right at the beginning.


==amazon_isbn.pl

#!/usr/local/bin/perl -I/home/webuser/manesh

require LWP;
require URI::URL;

use strict;
use CGI;

my($hdr,$server_response);
my($statement_URL)="http://www.amazon.de/exec/obidos/search-handle-form";

## Right now the values are hard-coded
$hdr="index=blended&field-keywords=3897211270&rank=+amzrank&Go.x=6&Go.y=6";

$server_response=&browse($statement_URL,$hdr);  ## Fire the URL

my(@html_file)=split(/\n/,$server_response);

foreach(@html_file){
   if (/^Location:(.*)/){
      $statement_URL=$1; ## Get the Actual Location of the result page
      last;
   }
}

print "Content-type: text/html\n\n";

use LWP::Simple;
my($content) = get($statement_URL); ## get that page .....
print $content; ## ... and show the same.

sub browse(){
   my($statement_URL,$hdr)=@_;
   my($content_type,$method);

   $content_type="application/x-www-form-urlencoded";
   $method="POST";

   my($headers)= new HTTP::Headers
     'Content-Type'   =>  $content_type,
     'MIME-Version'   =>  '1.0',
     'Date'           =>  HTTP::Date::time2str(time),
     'Accept'         =>  'text/html';

   my($ua)= new LWP::UserAgent;

   $ua->agent("Mozilla/4.7 [en] (WinNT; U)"); # Define env variable - HTTP_USER
_AGENT

   my($url)= new URI::URL($statement_URL);
   my($request)= new HTTP::Request($method, $url,$headers,$hdr);

   my($response)= $ua->request($request);
   return $response->as_string;  ## Return the RAW header

   my($reply);

   if ($response->is_success){
      $reply=$response->content;
   }else{
      $reply=$response->error_as_HTML();
   }
   return $reply;
}
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
oschleedeAuthor Commented:
Adjusted points to 500
0
oschleedeAuthor Commented:
You must be some sort of a guru :)
your script really does something magic!!!!!!

i will turn every stone of your script to get behind what i was doing so obviously wrong!

thanks!!
0
oschleedeAuthor Commented:
thats the magic key to my next big project!
0
maneshrCommented:
actually you were not doing anything wrong. its just that the amazon site works in a different way when you search for Books and when you search for Alles. all that i did was find out and exploit that difference.

i am glad that the program worked.
0
oschleedeAuthor Commented:
a different way when you search for Books and when you search for Alles...

-go http://www.amazon.de
-Select Music.
- Try to search for a any keyword.
- Enter a valid ISBNno (the selectbox shows still music) the system will provide you with the correct object, obviously ignoring that that the setting is still on music...

thats contradicting your explanation

0
maneshrCommented:
when i said different, i meant the way it works internally . From outside i.e from amazon's site the whole process is transparent. it appears as if the results are returned in one single request.

That was the reason the older script was failing. but when i started digging in the returned headers, i found the difference between the response send when you search for book and when you search for Alles.

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Scripting Languages

From novice to tech pro — start learning today.