Solved

Screen Scrape - Passing Post Variables - Not Working

Posted on 2004-10-03
16
384 Views
Last Modified: 2010-05-18
Hi

I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.

The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).

I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.

If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.

Now then, If at Step 2, I save the web page to my desktop, and then re-launch it from there, I can click on the outbound and inbound flight of choice and click SUBMIT, and it goes to Step 3 no problem... However, if I try to do this from my PHP rountine, it comes up with an error even though I am duplicating each variable in the post like for like. I have even written a routine which extracts all post variables and displays them - and then pointed the 'ACTION' of the saved form to this, then duplicated these fields, and re-tried, but it does not work.

Here is a link to the .html file which links in to the web site which works....

http://www.midweb.net/test/index.html

Can anyone duplicate this from a PHP rountine, but capture the output in a string ??? - If so, on top of the 500 points, I will gladly pay you in beers or whatever else you want - I am desparate to get this working now...

Many thanks

Matt Wilkes
0
Comment
Question by:milkmon123
  • 7
  • 6
16 Comments
 
LVL 27

Accepted Solution

by:
Diablo84 earned 250 total points
ID: 12212499
If i have followed what you are saying correctly then to summarise what you are trying to do is pass post data between multiple pages, in which case the sensible approach to take is using sessions.

Heres a guide line as to what you need to do:

1) on each of your pages add:

session_start();

as the top BEFORE any output including php echo/print, html or new lines outside of the <?php tags

2) each time you need to add a value to a session variable do so like this:

$_SESSION['name'] = "value";

where 'name' is the name of the session variable, this can be anything you want following normal variable naming schemes (an underscore or letter followed by any number of letters, underscores or numbers).

You will need to assigning post data to session variables so it would be like this:

$_SESSION['name'] = $_POST['name_of_input'];

where name_of_input is the name of the form element that contains the value you want to assign to the session variable.

Each session should have a unique name (unless you build the data into an array) which represents the data it contains and can each be later accessed on any page using, for example,

echo $_SESSION['name'];

as long as session_start(); is at the top of the page. This means at each stage you can assign the post data from the previous step to a session variable so that data is accessible later.

Hope thats what your looking for.
0
 
LVL 4

Expert Comment

by:aratani
ID: 12213155
I think if I understand your question right, you need to go from step 2 to step 3. This means you need to post data to a form. What you could do is post the form in step 2 to a php script that you have written that posts the form to the travellite.com website with all the variables set as it should be and gets all the data back and displays step 3.

A way to post data to a website is shown here,

http://www.zend.com/zend/spotlight/mimocsumissions.php

Use that method to post data to the form on the travellite webstie. (http://www.travellite.com/... form path).

You get back the results.

I hope that works then,

AJ
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12213585
Thanks - will look at this now.
0
 
LVL 4

Expert Comment

by:aratani
ID: 12213628
Okay, just message me if you have any questions about it. The article uses HTTP 1.0 to get the post variables. Most web servers now also accept HTTP 1.1. If you have any questions about getting it working with HTTP 1.1 then I can asnwer them for you. Also, using the script there with HTTP 1.0 will also work with webservers having HTTP 1.1 (backwards compatible).

AJ
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12213646
Hi there - thanks for the advice...

I have tried, and it still doesn't seem to work for some reason.

Have you had a look at an actual script that gets the screen into a variable ?

All the code I have been working with works with the majority of other web sites, but not this jump from step 2 to step 3.

To re-cap - go to www.midweb.net/test. You will see the screen come up for 'My Travel Lite'.  You can select an outbound and inbound flight and click continue - and get the result page with the amount for the flight.

OK so far - now, to see for yourself my problem - go back to www.midweb.net/test - and copy the html file to your own server / desktop whatever - and just change the 'action' on the form to point to your own php script which should be able to get this step 3 screen as a string - but it doesn't work - I just keep getting an error page.

Any ideas guys ?
0
 
LVL 4

Expert Comment

by:aratani
ID: 12213708
What kind of error page is it? What does it say on the error page? Maybe that would help solve things.

AJ
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12213764
Ok... Thanks for hanging in there. This is where we are at...

1. This works fine by getting the price of the flight displayed on the screen, but that is just jumping to the vendor web site: www.midweb.net/test

2. I wrote a program to capture all variables sent by the form (www.midweb.net/myfirsttest.php).. as follows...

<?php
print "<table>";

while (list ($key, $val) = each ($HTTP_POST_VARS)) {
      if ($key != "Submit"){
            print "<tr><td>$key</td><td>&nbsp;=&nbsp;</td><td>$val</td></tr>";
      }
}
print "</table>";
?>

which gives the following as output when I point the 'step 2' page to it...

PHPSESSID  =  c55d69114d074b68e451d96c51445197
change_nom  =  1
CHILD  =  00
language  =  EN
page  =  CONFIRM
INFANT  =  00
module  =  SB
mode  =  0
ADULT  =  02
px  =  ADULT ADTI02
m2F  =  
m2  =  20050103VZ 7012AGPBHX GBGBARA18999  
m1T  =  
mkt1_selected  =  
m2T  =  
m1F  =  
mkt2_selected  =  
m1  =  20041209VZ 7011BHXAGP GBGBIRI2499  
nom  =  2
fare_cat  =  
tc  =  2
pM  =  0

... so 3) I now create a script (www.midweb.net/test/mynexttest.php) as follows...

<?php

      $s="change_nom=1&CHILD=00&language=EN&page=CONFIRM&INFANT=00&module=SB&mode=0&ADULT=02&px=ADULT ADTI02&m2=20050103VZ 7012AGPBHX GBGBARA18999&m1=20041209VZ 7011BHXAGP GBGBIRI2499&nom=2&tc=2";

      $adr =  "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";

      $ch = curl_init();
      curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
      curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookieFileName");
      curl_setopt($ch, CURLOPT_POST, 1);
      curl_setopt($ch, CURLOPT_POSTFIELDS, $s);
      curl_setopt($ch, CURLOPT_URL,"http://mytravellite.com/skylights/cgi-bin/skylights.cgi");

      $buf2 = curl_exec ($ch);

      curl_close ($ch);

      echo ($buf2);

      ?>

... But this just give the following....

  Sorry.
There has been a problem with the database.
There may be too many users on the server
right now. Please try again later.
or phone our call centre on 08701 564 564

Case Number: MYT472411208
      
      Click here to return to our homepage.


...

I have tried everything - including substituting the spaces in the URL line for %20 etc...

But nothing seems to work.

There should be no difference in the output from the two files, but for some reason, the script at http://mytravellite.com/skylights/cgi-bin/skylights.cgi is not accepting the POST or GET data I generate from my script.

HELP ME PLEASE !!!!!!!!!!!!

I can't see where I'm going wrong ! -- Any ideas (my wife will divorce me if I don't go to bed soon - but if I don't solve this problem, I won't get paid, then she'll divorce me anyway - and she is really nice looking and a great cook - HELP HELP HELP !!!)

Matt
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 4

Expert Comment

by:aratani
ID: 12217596
I think one of the reasons you might be getting this error is since you are using the curl system.

There might be better ways to contact the server and get the page. For example this is the POST function on zend.com by John Coggeshall.

<?php

function post_it($datastream, $url) {

  $url = preg_replace("@^http://@i", "", $url);
  $host = substr($url, 0, strpos($url, "/"));
  $uri = strstr($url, "/");

  $reqbody = "";
  foreach($datastream as $key=>$val) {
      if (!is_empty($reqbody)) $reqbody.= "&";
      $reqbody.= $key."=".urlencode($val);
  }

  $contentlength = strlen($reqbody);
  $reqheader =  "POST $uri HTTP/1.0\r\n".
     "Host: $host\n". "User-Agent: PostIt\r\n".
     "Content-Type: application/x-www-form-urlencoded\r\n".
     "Content-Length: $contentlength\r\n\r\n".
     "$reqbody\r\n";

  $socket = fsockopen($host, 80, $errno, $errstr);

  if (!$socket) {
    $result["errno"] = $errno;
    $result["errstr"] = $errstr;
    return $result;
  }

  fputs($socket, $reqheader);

  while (!feof($socket)) {
    $result[] = fgets($socket, 4096);
  }

  fclose($socket);

  return $result;
}

?>

You can use this function and then send the data using the following. It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.

<?php

  //Set the data over here. As you would like to above
  $data["changenom"] = "1";
  $data["CHILD"] = "00";
  //And all the other  data items

  //Post it to the address
  $result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");

  if (isset($result["errno"])) {
    $errno = $result["errno"];
    $errstr = $result["errstr"];
    echo "<B>Error $errno</B> $errstr";
    exit;
  } else {

    for($i=0;$i< count($result); $i++) echo $result[$i];

  }

?>

Try this. It should work.

AJ
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12218004
Hi AJ

Thanks for your hard work.

When I tried, i got this error...

Fatal error: Call to undefined function: is_empty() in /usr/local/home/httpd/vhtdocs/sunshine-direct.com/onlinebooking/enquire3.php on line 11

So I swapped !is_empty for !empty... and this came up...

HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:08:13 GMT Connection: close Content-Length: 20
Bad Request

... which is a start, but still not working.

Here is the code I am using...

<?php

function post_it($datastream, $url) {

  $url = preg_replace("@^http://@i", "", $url);
  $host = substr($url, 0, strpos($url, "/"));
  $uri = strstr($url, "/");

  $reqbody = "";
  foreach($datastream as $key=>$val) {
      if (!empty($reqbody)) $reqbody.= "&";
      $reqbody.= $key."=".urlencode($val);
  }

  $contentlength = strlen($reqbody);
  $reqheader =  "POST $uri HTTP/1.0\r\n".
     "Host: $host\n". "User-Agent: PostIt\r\n".
     "Content-Type: application/x-www-form-urlencoded\r\n".
     "Content-Length: $contentlength\r\n\r\n".
     "$reqbody\r\n";

  $socket = fsockopen($host, 80, $errno, $errstr);

  if (!$socket) {
    $result["errno"] = $errno;
    $result["errstr"] = $errstr;
    return $result;
  }

  fputs($socket, $reqheader);

  while (!feof($socket)) {
    $result[] = fgets($socket, 4096);
  }

  fclose($socket);

  return $result;
}



// You can use this function and then send the data using the following.
// It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.



  //Set the data over here. As you would like to above
  $data["change_nom"] = "1";
  $data["CHILD"] = "00";
  $data["language"] = "EN";
  $data["page"] = "CONFIRM";
  $data["INFANT"] = "00";
  $data["module"] = "0";
  $data["mode"] = "0";
  $data["ADULT"] = "02";
  $data["px"] = "ADULT ADTI02";
  $data["m2F"] = "";
  $data["m2"] = "20050103VZ 7012AGPBHX GBGBARA18999";
  $data["m1T"] = "";
  $data["mkt1_selected"] = "";
  $data["m2T"] = "";
  $data["m1F"] = "";
  $data["mkt2_selected"] = "";
  $data["m1"] = "20041209VZ 7011BHXAGP GBGBIRI2499";
  $data["nom"] = "2";
  $data["fare_cat"] = "";
  $data["tc"] = "2";
  $data["pM"] = "0";


  //Post it to the address
  $result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");

  if (isset($result["errno"])) {
    $errno = $result["errno"];
    $errstr = $result["errstr"];
    echo "<B>Error $errno</B> $errstr";
    exit;
  } else {

    for($i=0;$i< count($result); $i++) echo $result[$i];

  }

?>

If you want to cut and paste and try it on your server.

Many thanks for all your help

Best regards,

Matt Wilkes

0
 
LVL 4

Expert Comment

by:aratani
ID: 12218143
Try this Matt,

Use the following,

 $reqheader =  "POST $uri HTTP/1.1\r\n".
     "Host: $host\n".
     "User-Agent: PostIt\r\n".
     "Connection: close\r\n".
     "Content-Type: application/x-www-form-urlencoded\r\n".
     "Content-Length: $contentlength\r\n\r\n".
     "$reqbody\r\n";

I am using a HTTP/1.1 header format. Probably the server only accepts HTTP 1.1 requests. Please tell me how that came out.

AJ
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12218303
Hi there

Sorry - still giving bad request

HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:36:50 GMT Connection: close Content-Length: 20
Bad Request

Thanks for trying...
0
 
LVL 4

Expert Comment

by:aratani
ID: 12218361
Let me try this on my machine and I'll get back to you.

AJ
0
 
LVL 4

Assisted Solution

by:aratani
aratani earned 250 total points
ID: 12218613
I tried a lot of different combinations, and I even sniffed the header packets that were being sent from IE. And I tried making the header we are sending from our script the same. It must be something on their side which is cause this to fail. I would just contact their support and ask them that you are posting these variables and are getting a database error. Maybe the variables aren't defined right i.e. the values of the variables aren't what travellite is expecting .. that might be a problem.

You are connecting to their server, and sending the data, but most likely something is wrong in the data you are sending .. so just check if all the values and the keys are perfect, and contact them and ask them.

To sniff the packets as to what is being sent, use ethereal.

Take care

AJ
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12517080
Hi

Sorry for not responding - the forced accept may have been correct in itself, but was not the solution.

In actual fact, the posted variables were padded out at the end with spaces, which were invisible when we looked at the output.

When this was duplicated, it worked fine - so it was a silly mistake (as is usually the case).

Best

Milkmon123
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now