milkmon123
asked on
Screen Scrape - Passing Post Variables - Not Working
Hi
I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.
The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).
I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.
If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.
Now then, If at Step 2, I save the web page to my desktop, and then re-launch it from there, I can click on the outbound and inbound flight of choice and click SUBMIT, and it goes to Step 3 no problem... However, if I try to do this from my PHP rountine, it comes up with an error even though I am duplicating each variable in the post like for like. I have even written a routine which extracts all post variables and displays them - and then pointed the 'ACTION' of the saved form to this, then duplicated these fields, and re-tried, but it does not work.
Here is a link to the .html file which links in to the web site which works....
http://www.midweb.net/test/index.html
Can anyone duplicate this from a PHP rountine, but capture the output in a string ??? - If so, on top of the 500 points, I will gladly pay you in beers or whatever else you want - I am desparate to get this working now...
Many thanks
Matt Wilkes
I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.
The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).
I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.
If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.
Now then, If at Step 2, I save the web page to my desktop, and then re-launch it from there, I can click on the outbound and inbound flight of choice and click SUBMIT, and it goes to Step 3 no problem... However, if I try to do this from my PHP rountine, it comes up with an error even though I am duplicating each variable in the post like for like. I have even written a routine which extracts all post variables and displays them - and then pointed the 'ACTION' of the saved form to this, then duplicated these fields, and re-tried, but it does not work.
Here is a link to the .html file which links in to the web site which works....
http://www.midweb.net/test/index.html
Can anyone duplicate this from a PHP rountine, but capture the output in a string ??? - If so, on top of the 500 points, I will gladly pay you in beers or whatever else you want - I am desparate to get this working now...
Many thanks
Matt Wilkes
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks - will look at this now.
Okay, just message me if you have any questions about it. The article uses HTTP 1.0 to get the post variables. Most web servers now also accept HTTP 1.1. If you have any questions about getting it working with HTTP 1.1 then I can asnwer them for you. Also, using the script there with HTTP 1.0 will also work with webservers having HTTP 1.1 (backwards compatible).
AJ
AJ
ASKER
Hi there - thanks for the advice...
I have tried, and it still doesn't seem to work for some reason.
Have you had a look at an actual script that gets the screen into a variable ?
All the code I have been working with works with the majority of other web sites, but not this jump from step 2 to step 3.
To re-cap - go to www.midweb.net/test. You will see the screen come up for 'My Travel Lite'. You can select an outbound and inbound flight and click continue - and get the result page with the amount for the flight.
OK so far - now, to see for yourself my problem - go back to www.midweb.net/test - and copy the html file to your own server / desktop whatever - and just change the 'action' on the form to point to your own php script which should be able to get this step 3 screen as a string - but it doesn't work - I just keep getting an error page.
Any ideas guys ?
I have tried, and it still doesn't seem to work for some reason.
Have you had a look at an actual script that gets the screen into a variable ?
All the code I have been working with works with the majority of other web sites, but not this jump from step 2 to step 3.
To re-cap - go to www.midweb.net/test. You will see the screen come up for 'My Travel Lite'. You can select an outbound and inbound flight and click continue - and get the result page with the amount for the flight.
OK so far - now, to see for yourself my problem - go back to www.midweb.net/test - and copy the html file to your own server / desktop whatever - and just change the 'action' on the form to point to your own php script which should be able to get this step 3 screen as a string - but it doesn't work - I just keep getting an error page.
Any ideas guys ?
What kind of error page is it? What does it say on the error page? Maybe that would help solve things.
AJ
AJ
ASKER
Ok... Thanks for hanging in there. This is where we are at...
1. This works fine by getting the price of the flight displayed on the screen, but that is just jumping to the vendor web site: www.midweb.net/test
2. I wrote a program to capture all variables sent by the form (www.midweb.net/myfirsttest.php).. as follows...
<?php
print "<table>";
while (list ($key, $val) = each ($HTTP_POST_VARS)) {
if ($key != "Submit"){
print "<tr><td>$key</td><td>&nbs p;= < /td><td>$v al</td></t r>";
}
}
print "</table>";
?>
which gives the following as output when I point the 'step 2' page to it...
PHPSESSID = c55d69114d074b68e451d96c51 445197
change_nom = 1
CHILD = 00
language = EN
page = CONFIRM
INFANT = 00
module = SB
mode = 0
ADULT = 02
px = ADULT ADTI02
m2F =
m2 = 20050103VZ 7012AGPBHX GBGBARA18999
m1T =
mkt1_selected =
m2T =
m1F =
mkt2_selected =
m1 = 20041209VZ 7011BHXAGP GBGBIRI2499
nom = 2
fare_cat =
tc = 2
pM = 0
... so 3) I now create a script (www.midweb.net/test/mynexttest.php) as follows...
<?php
$s="change_nom=1&CHILD=00& language=E N&page=CON FIRM&INFAN T=00&modul e=SB&mode= 0&ADULT=02 &px=ADULT ADTI02&m2=20050103VZ 7012AGPBHX GBGBARA18999&m1=20041209VZ 7011BHXAGP GBGBIRI2499&nom=2&tc=2";
$adr = "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookieFileName");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $s);
curl_setopt($ch, CURLOPT_URL,"http://mytravellite.com/skylights/cgi-bin/skylights.cgi");
$buf2 = curl_exec ($ch);
curl_close ($ch);
echo ($buf2);
?>
... But this just give the following....
Sorry.
There has been a problem with the database.
There may be too many users on the server
right now. Please try again later.
or phone our call centre on 08701 564 564
Case Number: MYT472411208
Click here to return to our homepage.
...
I have tried everything - including substituting the spaces in the URL line for %20 etc...
But nothing seems to work.
There should be no difference in the output from the two files, but for some reason, the script at http://mytravellite.com/skylights/cgi-bin/skylights.cgi is not accepting the POST or GET data I generate from my script.
HELP ME PLEASE !!!!!!!!!!!!
I can't see where I'm going wrong ! -- Any ideas (my wife will divorce me if I don't go to bed soon - but if I don't solve this problem, I won't get paid, then she'll divorce me anyway - and she is really nice looking and a great cook - HELP HELP HELP !!!)
Matt
1. This works fine by getting the price of the flight displayed on the screen, but that is just jumping to the vendor web site: www.midweb.net/test
2. I wrote a program to capture all variables sent by the form (www.midweb.net/myfirsttest.php).. as follows...
<?php
print "<table>";
while (list ($key, $val) = each ($HTTP_POST_VARS)) {
if ($key != "Submit"){
print "<tr><td>$key</td><td>&nbs
}
}
print "</table>";
?>
which gives the following as output when I point the 'step 2' page to it...
PHPSESSID = c55d69114d074b68e451d96c51
change_nom = 1
CHILD = 00
language = EN
page = CONFIRM
INFANT = 00
module = SB
mode = 0
ADULT = 02
px = ADULT ADTI02
m2F =
m2 = 20050103VZ 7012AGPBHX GBGBARA18999
m1T =
mkt1_selected =
m2T =
m1F =
mkt2_selected =
m1 = 20041209VZ 7011BHXAGP GBGBIRI2499
nom = 2
fare_cat =
tc = 2
pM = 0
... so 3) I now create a script (www.midweb.net/test/mynexttest.php) as follows...
<?php
$s="change_nom=1&CHILD=00&
$adr = "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookieFileName");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $s);
curl_setopt($ch, CURLOPT_URL,"http://mytravellite.com/skylights/cgi-bin/skylights.cgi");
$buf2 = curl_exec ($ch);
curl_close ($ch);
echo ($buf2);
?>
... But this just give the following....
Sorry.
There has been a problem with the database.
There may be too many users on the server
right now. Please try again later.
or phone our call centre on 08701 564 564
Case Number: MYT472411208
Click here to return to our homepage.
...
I have tried everything - including substituting the spaces in the URL line for %20 etc...
But nothing seems to work.
There should be no difference in the output from the two files, but for some reason, the script at http://mytravellite.com/skylights/cgi-bin/skylights.cgi is not accepting the POST or GET data I generate from my script.
HELP ME PLEASE !!!!!!!!!!!!
I can't see where I'm going wrong ! -- Any ideas (my wife will divorce me if I don't go to bed soon - but if I don't solve this problem, I won't get paid, then she'll divorce me anyway - and she is really nice looking and a great cook - HELP HELP HELP !!!)
Matt
I think one of the reasons you might be getting this error is since you are using the curl system.
There might be better ways to contact the server and get the page. For example this is the POST function on zend.com by John Coggeshall.
<?php
function post_it($datastream, $url) {
$url = preg_replace("@^http://@i", "", $url);
$host = substr($url, 0, strpos($url, "/"));
$uri = strstr($url, "/");
$reqbody = "";
foreach($datastream as $key=>$val) {
if (!is_empty($reqbody)) $reqbody.= "&";
$reqbody.= $key."=".urlencode($val);
}
$contentlength = strlen($reqbody);
$reqheader = "POST $uri HTTP/1.0\r\n".
"Host: $host\n". "User-Agent: PostIt\r\n".
"Content-Type: application/x-www-form-url encoded\r\ n".
"Content-Length: $contentlength\r\n\r\n".
"$reqbody\r\n";
$socket = fsockopen($host, 80, $errno, $errstr);
if (!$socket) {
$result["errno"] = $errno;
$result["errstr"] = $errstr;
return $result;
}
fputs($socket, $reqheader);
while (!feof($socket)) {
$result[] = fgets($socket, 4096);
}
fclose($socket);
return $result;
}
?>
You can use this function and then send the data using the following. It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.
<?php
//Set the data over here. As you would like to above
$data["changenom"] = "1";
$data["CHILD"] = "00";
//And all the other data items
//Post it to the address
$result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");
if (isset($result["errno"])) {
$errno = $result["errno"];
$errstr = $result["errstr"];
echo "<B>Error $errno</B> $errstr";
exit;
} else {
for($i=0;$i< count($result); $i++) echo $result[$i];
}
?>
Try this. It should work.
AJ
There might be better ways to contact the server and get the page. For example this is the POST function on zend.com by John Coggeshall.
<?php
function post_it($datastream, $url) {
$url = preg_replace("@^http://@i", "", $url);
$host = substr($url, 0, strpos($url, "/"));
$uri = strstr($url, "/");
$reqbody = "";
foreach($datastream as $key=>$val) {
if (!is_empty($reqbody)) $reqbody.= "&";
$reqbody.= $key."=".urlencode($val);
}
$contentlength = strlen($reqbody);
$reqheader = "POST $uri HTTP/1.0\r\n".
"Host: $host\n". "User-Agent: PostIt\r\n".
"Content-Type: application/x-www-form-url
"Content-Length: $contentlength\r\n\r\n".
"$reqbody\r\n";
$socket = fsockopen($host, 80, $errno, $errstr);
if (!$socket) {
$result["errno"] = $errno;
$result["errstr"] = $errstr;
return $result;
}
fputs($socket, $reqheader);
while (!feof($socket)) {
$result[] = fgets($socket, 4096);
}
fclose($socket);
return $result;
}
?>
You can use this function and then send the data using the following. It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.
<?php
//Set the data over here. As you would like to above
$data["changenom"] = "1";
$data["CHILD"] = "00";
//And all the other data items
//Post it to the address
$result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");
if (isset($result["errno"])) {
$errno = $result["errno"];
$errstr = $result["errstr"];
echo "<B>Error $errno</B> $errstr";
exit;
} else {
for($i=0;$i< count($result); $i++) echo $result[$i];
}
?>
Try this. It should work.
AJ
ASKER
Hi AJ
Thanks for your hard work.
When I tried, i got this error...
Fatal error: Call to undefined function: is_empty() in /usr/local/home/httpd/vhtd ocs/sunshi ne-direct. com/online booking/en quire3.php on line 11
So I swapped !is_empty for !empty... and this came up...
HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:08:13 GMT Connection: close Content-Length: 20
Bad Request
... which is a start, but still not working.
Here is the code I am using...
<?php
function post_it($datastream, $url) {
$url = preg_replace("@^http://@i", "", $url);
$host = substr($url, 0, strpos($url, "/"));
$uri = strstr($url, "/");
$reqbody = "";
foreach($datastream as $key=>$val) {
if (!empty($reqbody)) $reqbody.= "&";
$reqbody.= $key."=".urlencode($val);
}
$contentlength = strlen($reqbody);
$reqheader = "POST $uri HTTP/1.0\r\n".
"Host: $host\n". "User-Agent: PostIt\r\n".
"Content-Type: application/x-www-form-url encoded\r\ n".
"Content-Length: $contentlength\r\n\r\n".
"$reqbody\r\n";
$socket = fsockopen($host, 80, $errno, $errstr);
if (!$socket) {
$result["errno"] = $errno;
$result["errstr"] = $errstr;
return $result;
}
fputs($socket, $reqheader);
while (!feof($socket)) {
$result[] = fgets($socket, 4096);
}
fclose($socket);
return $result;
}
// You can use this function and then send the data using the following.
// It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.
//Set the data over here. As you would like to above
$data["change_nom"] = "1";
$data["CHILD"] = "00";
$data["language"] = "EN";
$data["page"] = "CONFIRM";
$data["INFANT"] = "00";
$data["module"] = "0";
$data["mode"] = "0";
$data["ADULT"] = "02";
$data["px"] = "ADULT ADTI02";
$data["m2F"] = "";
$data["m2"] = "20050103VZ 7012AGPBHX GBGBARA18999";
$data["m1T"] = "";
$data["mkt1_selected"] = "";
$data["m2T"] = "";
$data["m1F"] = "";
$data["mkt2_selected"] = "";
$data["m1"] = "20041209VZ 7011BHXAGP GBGBIRI2499";
$data["nom"] = "2";
$data["fare_cat"] = "";
$data["tc"] = "2";
$data["pM"] = "0";
//Post it to the address
$result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");
if (isset($result["errno"])) {
$errno = $result["errno"];
$errstr = $result["errstr"];
echo "<B>Error $errno</B> $errstr";
exit;
} else {
for($i=0;$i< count($result); $i++) echo $result[$i];
}
?>
If you want to cut and paste and try it on your server.
Many thanks for all your help
Best regards,
Matt Wilkes
Thanks for your hard work.
When I tried, i got this error...
Fatal error: Call to undefined function: is_empty() in /usr/local/home/httpd/vhtd
So I swapped !is_empty for !empty... and this came up...
HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:08:13 GMT Connection: close Content-Length: 20
Bad Request
... which is a start, but still not working.
Here is the code I am using...
<?php
function post_it($datastream, $url) {
$url = preg_replace("@^http://@i", "", $url);
$host = substr($url, 0, strpos($url, "/"));
$uri = strstr($url, "/");
$reqbody = "";
foreach($datastream as $key=>$val) {
if (!empty($reqbody)) $reqbody.= "&";
$reqbody.= $key."=".urlencode($val);
}
$contentlength = strlen($reqbody);
$reqheader = "POST $uri HTTP/1.0\r\n".
"Host: $host\n". "User-Agent: PostIt\r\n".
"Content-Type: application/x-www-form-url
"Content-Length: $contentlength\r\n\r\n".
"$reqbody\r\n";
$socket = fsockopen($host, 80, $errno, $errstr);
if (!$socket) {
$result["errno"] = $errno;
$result["errstr"] = $errstr;
return $result;
}
fputs($socket, $reqheader);
while (!feof($socket)) {
$result[] = fgets($socket, 4096);
}
fclose($socket);
return $result;
}
// You can use this function and then send the data using the following.
// It contacts the server and gets the resulting page for you which you can display on your screen if you would like too.
//Set the data over here. As you would like to above
$data["change_nom"] = "1";
$data["CHILD"] = "00";
$data["language"] = "EN";
$data["page"] = "CONFIRM";
$data["INFANT"] = "00";
$data["module"] = "0";
$data["mode"] = "0";
$data["ADULT"] = "02";
$data["px"] = "ADULT ADTI02";
$data["m2F"] = "";
$data["m2"] = "20050103VZ 7012AGPBHX GBGBARA18999";
$data["m1T"] = "";
$data["mkt1_selected"] = "";
$data["m2T"] = "";
$data["m1F"] = "";
$data["mkt2_selected"] = "";
$data["m1"] = "20041209VZ 7011BHXAGP GBGBIRI2499";
$data["nom"] = "2";
$data["fare_cat"] = "";
$data["tc"] = "2";
$data["pM"] = "0";
//Post it to the address
$result = post_it($data, "http://mytravellite.com/skylights/cgi-bin/skylights.cgi");
if (isset($result["errno"])) {
$errno = $result["errno"];
$errstr = $result["errstr"];
echo "<B>Error $errno</B> $errstr";
exit;
} else {
for($i=0;$i< count($result); $i++) echo $result[$i];
}
?>
If you want to cut and paste and try it on your server.
Many thanks for all your help
Best regards,
Matt Wilkes
Try this Matt,
Use the following,
$reqheader = "POST $uri HTTP/1.1\r\n".
"Host: $host\n".
"User-Agent: PostIt\r\n".
"Connection: close\r\n".
"Content-Type: application/x-www-form-url encoded\r\ n".
"Content-Length: $contentlength\r\n\r\n".
"$reqbody\r\n";
I am using a HTTP/1.1 header format. Probably the server only accepts HTTP 1.1 requests. Please tell me how that came out.
AJ
Use the following,
$reqheader = "POST $uri HTTP/1.1\r\n".
"Host: $host\n".
"User-Agent: PostIt\r\n".
"Connection: close\r\n".
"Content-Type: application/x-www-form-url
"Content-Length: $contentlength\r\n\r\n".
"$reqbody\r\n";
I am using a HTTP/1.1 header format. Probably the server only accepts HTTP 1.1 requests. Please tell me how that came out.
AJ
ASKER
Hi there
Sorry - still giving bad request
HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:36:50 GMT Connection: close Content-Length: 20
Bad Request
Thanks for trying...
Sorry - still giving bad request
HTTP/1.1 400 Bad Request Content-Type: text/html Date: Mon, 04 Oct 2004 15:36:50 GMT Connection: close Content-Length: 20
Bad Request
Thanks for trying...
Let me try this on my machine and I'll get back to you.
AJ
AJ
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi
Sorry for not responding - the forced accept may have been correct in itself, but was not the solution.
In actual fact, the posted variables were padded out at the end with spaces, which were invisible when we looked at the output.
When this was duplicated, it worked fine - so it was a silly mistake (as is usually the case).
Best
Milkmon123
Sorry for not responding - the forced accept may have been correct in itself, but was not the solution.
In actual fact, the posted variables were padded out at the end with spaces, which were invisible when we looked at the output.
When this was duplicated, it worked fine - so it was a silly mistake (as is usually the case).
Best
Milkmon123
A way to post data to a website is shown here,
http://www.zend.com/zend/spotlight/mimocsumissions.php
Use that method to post data to the form on the travellite webstie. (http://www.travellite.com/... form path).
You get back the results.
I hope that works then,
AJ