Link to home
Start Free TrialLog in
Avatar of milkmon123
milkmon123

asked on

Screen Scrape - Passing Post Variables - Not Working

Hi

I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.

The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).

I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.

If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.

Now then, If at Step 2, I save the web page to my desktop, and then re-launch it from there, I can click on the outbound and inbound flight of choice and click SUBMIT, and it goes to Step 3 no problem... However, if I try to do this from my PHP rountine, it comes up with an error even though I am duplicating each variable in the post like for like. I have even written a routine which extracts all post variables and displays them - and then pointed the 'ACTION' of the saved form to this, then duplicated these fields, and re-tried, but it does not work.

Here is a link to the .html file which links in to the web site which works....

http://www.midweb.net/test/index.html

Can anyone duplicate this from a PHP rountine, but capture the output in a string ??? - If so, on top of the 500 points, I will gladly pay you in beers or whatever else you want - I am desparate to get this working now...

Many thanks

Matt Wilkes
ASKER CERTIFIED SOLUTION
Avatar of Diablo84
Diablo84

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of aratani
aratani

I think if I understand your question right, you need to go from step 2 to step 3. This means you need to post data to a form. What you could do is post the form in step 2 to a php script that you have written that posts the form to the travellite.com website with all the variables set as it should be and gets all the data back and displays step 3.

A way to post data to a website is shown here,

http://www.zend.com/zend/spotlight/mimocsumissions.php

Use that method to post data to the form on the travellite webstie. (http://www.travellite.com/... form path).

You get back the results.

I hope that works then,

AJ
Avatar of milkmon123

ASKER

Thanks - will look at this now.
Okay, just message me if you have any questions about it. The article uses HTTP 1.0 to get the post variables. Most web servers now also accept HTTP 1.1. If you have any questions about getting it working with HTTP 1.1 then I can asnwer them for you. Also, using the script there with HTTP 1.0 will also work with webservers having HTTP 1.1 (backwards compatible).

AJ
Hi there - thanks for the advice...

I have tried, and it still doesn't seem to work for some reason.

Have you had a look at an actual script that gets the screen into a variable ?

All the code I have been working with works with the majority of other web sites, but not this jump from step 2 to step 3.

To re-cap - go to www.midweb.net/test. You will see the screen come up for 'My Travel Lite'.  You can select an outbound and inbound flight and click continue - and get the result page with the amount for the flight.

OK so far - now, to see for yourself my problem - go back to www.midweb.net/test - and copy the html file to your own server / desktop whatever - and just change the 'action' on the form to point to your own php script which should be able to get this step 3 screen as a string - but it doesn't work - I just keep getting an error page.

Any ideas guys ?
What kind of error page is it? What does it say on the error page? Maybe that would help solve things.

AJ
Ok... Thanks for hanging in there. This is where we are at...

1. This works fine by getting the price of the flight displayed on the screen, but that is just jumping to the vendor web site: www.midweb.net/test

2. I wrote a program to capture all variables sent by the form (www.midweb.net/myfirsttest.php).. as follows...

<?php
print "<table>";

while (list ($key, $val) = each ($HTTP_POST_VARS)) {
      if ($key != "Submit"){
            print "<tr><td>$key</td><td>&nbsp;=&nbsp;</td><td>$val</td></tr>";
      }
}
print "</table>";
?>

which gives the following as output when I point the 'step 2' page to it...

PHPSESSID  =  c55d69114d074b68e451d96c51445197
change_nom  =  1
CHILD  =  00
language  =  EN
page  =  CONFIRM
INFANT  =  00
module  =  SB
mode  =  0
ADULT  =  02
px  =  ADULT ADTI02
m2F  =  
m2  =  20050103VZ 7012AGPBHX GBGBARA18999  
m1T  =  
mkt1_selected  =  
m2T  =  
m1F  =  
mkt2_selected  =  
m1  =  20041209VZ 7011BHXAGP GBGBIRI2499  
nom  =  2
fare_cat  =  
tc  =  2
pM  =  0

... so 3) I now create a script (www.midweb.net/test/mynexttest.php) as follows...

<?php

      $s="change_nom=1&CHILD=00&language=EN&page=CONFIRM&INFANT=00&module=SB&mode=0&ADULT=02&px=ADULT ADTI02&m2=20050103VZ 7012AGPBHX GBGBARA18999&m1=20041209VZ 7011BHXAGP GBGBIRI2499&nom=2&tc=2";

      $adr =  "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";

      $ch = curl_init();
      curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
      curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookieFileName");
      curl_setopt($ch, CURLOPT_POST, 1);
      curl_setopt($ch, CURLOPT_POSTFIELDS, $s);
      curl_setopt($ch, CURLOPT_URL,"http://mytravellite.com/skylights/cgi-bin/skylights.cgi");

      $buf2 = curl_exec ($ch);

      curl_close ($ch);

      echo ($buf2);

      ?>

... But this just give the following....

  Sorry.
There has been a problem with the database.
There may be too many users on the server
right now. Please try again later.
or phone our call centre on 08701 564 564

Case Number: MYT472411208
      
      Click here to return to our homepage.


...

I have tried everything - including substituting the spaces in the URL line for %20 etc...

But nothing seems to work.

There should be no difference in the output from the two files, but for some reason, the script at http://mytravellite.com/skylights/cgi-bin/skylights.cgi is not accepting the POST or GET data I generate from my script.

HELP ME PLEASE !!!!!!!!!!!!

I can't see where I'm going wrong ! -- Any ideas (my wife will divorce me if I don't go to bed soon - but if I don't solve this problem, I won't get paid, then she'll divorce me anyway - and she is really nice looking and a great cook - HELP HELP HELP !!!)

Matt
I think one of the reasons you might be getting this error is since you are using the curl system.

There might be better ways to contact the server and get the page. For example this is the POST function on zend.com by John Coggeshall.

<?php

function post_it($datastream, $url) {

  $url = preg_replace("@^http://@i", "", $url);
  $host = substr($url, 0, strpos($url, "/"));
  $uri = strstr($url, "/");

  $reqbody = "";
  foreach($datastream as $key=>$val) {
      if (!is_empty($reqbody)) $reqbody.= "&";
      $reqbody.= $key."=".urlencode($val);
  }

  $contentlength = strlen($reqbody);
  $reqheader =  "POST $uri HTTP/1.0\r\n".
     "Host: $host\n". "User-Agent: PostIt\r\n".
     "Content-Type: application/x-www-form-urlencoded\r\n".
     "Content-Length: $contentlength\r\n\r\n".
     "$reqbody\r\n";

  $socket = fsockopen($host, 80, $errno, $errstr);

  if (!$socket) {
    $result["errno"] = $errno;
    $result["errstr"] = $errstr;
    return $result;
  }

  fputs($socket, $reqheader);

  while (!feof($socket)) {
    $result[] = fgets($socket, 4096);
  }

  fclose($socket);

  return $result;
}

?>

You can use this function and then send the data using the following. It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.

<?php

  //Set the data over here. As you would like to above
  $data["changenom"] = "1";
  $data["CHILD"] = "00";
  //And all the other  data items

  //Post it to the address
  $result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");

  if (isset($result["errno"])) {
    $errno = $result["errno"];
    $errstr = $result["errstr"];
    echo "<B>Error $errno</B> $errstr";
    exit;
  } else {

    for($i=0;$i< count($result); $i++) echo $result[$i];

  }

?>

Try this. It should work.

AJ
Hi AJ

Thanks for your hard work.

When I tried, i got this error...

Fatal error: Call to undefined function: is_empty() in /usr/local/home/httpd/vhtdocs/sunshine-direct.com/onlinebooking/enquire3.php on line 11

So I swapped !is_empty for !empty... and this came up...

HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:08:13 GMT Connection: close Content-Length: 20
Bad Request

... which is a start, but still not working.

Here is the code I am using...

<?php

function post_it($datastream, $url) {

  $url = preg_replace("@^http://@i", "", $url);
  $host = substr($url, 0, strpos($url, "/"));
  $uri = strstr($url, "/");

  $reqbody = "";
  foreach($datastream as $key=>$val) {
      if (!empty($reqbody)) $reqbody.= "&";
      $reqbody.= $key."=".urlencode($val);
  }

  $contentlength = strlen($reqbody);
  $reqheader =  "POST $uri HTTP/1.0\r\n".
     "Host: $host\n". "User-Agent: PostIt\r\n".
     "Content-Type: application/x-www-form-urlencoded\r\n".
     "Content-Length: $contentlength\r\n\r\n".
     "$reqbody\r\n";

  $socket = fsockopen($host, 80, $errno, $errstr);

  if (!$socket) {
    $result["errno"] = $errno;
    $result["errstr"] = $errstr;
    return $result;
  }

  fputs($socket, $reqheader);

  while (!feof($socket)) {
    $result[] = fgets($socket, 4096);
  }

  fclose($socket);

  return $result;
}



// You can use this function and then send the data using the following.
// It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.



  //Set the data over here. As you would like to above
  $data["change_nom"] = "1";
  $data["CHILD"] = "00";
  $data["language"] = "EN";
  $data["page"] = "CONFIRM";
  $data["INFANT"] = "00";
  $data["module"] = "0";
  $data["mode"] = "0";
  $data["ADULT"] = "02";
  $data["px"] = "ADULT ADTI02";
  $data["m2F"] = "";
  $data["m2"] = "20050103VZ 7012AGPBHX GBGBARA18999";
  $data["m1T"] = "";
  $data["mkt1_selected"] = "";
  $data["m2T"] = "";
  $data["m1F"] = "";
  $data["mkt2_selected"] = "";
  $data["m1"] = "20041209VZ 7011BHXAGP GBGBIRI2499";
  $data["nom"] = "2";
  $data["fare_cat"] = "";
  $data["tc"] = "2";
  $data["pM"] = "0";


  //Post it to the address
  $result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");

  if (isset($result["errno"])) {
    $errno = $result["errno"];
    $errstr = $result["errstr"];
    echo "<B>Error $errno</B> $errstr";
    exit;
  } else {

    for($i=0;$i< count($result); $i++) echo $result[$i];

  }

?>

If you want to cut and paste and try it on your server.

Many thanks for all your help

Best regards,

Matt Wilkes

Try this Matt,

Use the following,

 $reqheader =  "POST $uri HTTP/1.1\r\n".
     "Host: $host\n".
     "User-Agent: PostIt\r\n".
     "Connection: close\r\n".
     "Content-Type: application/x-www-form-urlencoded\r\n".
     "Content-Length: $contentlength\r\n\r\n".
     "$reqbody\r\n";

I am using a HTTP/1.1 header format. Probably the server only accepts HTTP 1.1 requests. Please tell me how that came out.

AJ
Hi there

Sorry - still giving bad request

HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:36:50 GMT Connection: close Content-Length: 20
Bad Request

Thanks for trying...
Let me try this on my machine and I'll get back to you.

AJ
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi

Sorry for not responding - the forced accept may have been correct in itself, but was not the solution.

In actual fact, the posted variables were padded out at the end with spaces, which were invisible when we looked at the output.

When this was duplicated, it worked fine - so it was a silly mistake (as is usually the case).

Best

Milkmon123