Solved

Screen Scrape - Passing Post Variables - Not Working

Posted on 2004-09-19
14
257 Views
Last Modified: 2012-06-27
Hi

I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.

The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).

I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.

If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.

Here is the code I am using...

// Get outbound string... (at this point just gets first, not cheapest)...

      $start=strpos($string, "<img src=../images/icon_plane_rgt_ovr.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_outboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_outboundstring, "set_selected('1','")+18;

      $temp_outboundstring = substr($temp_outboundstring, $st2);

      $outboundstring = substr($temp_outboundstring, 0, 88);


// Now get inbound string...

      $start=strpos($string, "<img src=../images/icon_plane_lft_over.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_inboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_inboundstring, "set_selected('2','")+18;

      $temp_inboundstring = substr($temp_inboundstring, $st2);

      $inboundstring = substr($temp_inboundstring, 0, 88);


// Now get the final screen...
      $s="change_nom=1&CHILD=$child&language=EN&page=CONFIRM&";
      $s.="INFANT=$infant&module=SB&mode=0&ADULT=$adult&";
      $s.="px=&m2F=&m2=".$year_out.$month_out.$day_out."ALC".$airport."&";
      $s.="m1T=&m2T=&m1F=&m1=".$year_in.$month_in.$day_in.$airport."ALC"."&";
      $s.="nom=2&fare_cat=&tc=2&pM=0&";
      $s.="mkt1_selected=$outboundstring&";
      $s.="mkt2_selected=$inboundstring";

      $adr =  "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";

      $fp  = @fopen($adr, r);
      
      $start = time();
      
      $string="";
      while($str = fgets($fp, 1024)) {
            $string.=$str;
            if (time() > $start+50){
                  die("Timed out");
            }
      }
      fclose($fp);


This is probably irrelevant to the question mind you. However, it works for step 1 -> step 2, but step 2 -> step 3 it falls over, and the mytravellight web site gives an error.

I have checked the variables being sent, and replicated them, but it is not interested at all.

To check if it is possible, I have done a 'File -> Save As' when at step 2 on the web site, and saved the html to my desktop. I have then, in a separate session, double clicked it, selected the relevant flights, and clicked 'Submit' and this works fine and goes to Step 3.

I then set the 'action' of the form to point to a test .php file which loops through the post variables listing each one (like so...

<?php
print "<table>";
reset ($HTTP_POST_VARS);
while (list ($key, $val) = each ($HTTP_POST_VARS)) {
      if (($key != "Submit") && ($key != "recipient") && ($key != "subject")){
            print "<tr><td>$key</td><td>&nbsp;=&nbsp;</td><td>$val</td></tr>";
          $content=$content. "$key => $val\r\n";
          $html_content=$html_content. "$key => $val<br>";
      }
}
print "</table>";
?>

... I then replicated each of these on both a command line, AND as hidden fields in a form, and tried to make it work, getting to Step 3- but BOTH methods failed.

So, in summary...

There is no problem getting from step 2 to step 3 from an external site - but I cannot seem to find a way to do it using fget...

Can anyone help ?

Many thanks

Matt
0
Comment
Question by:milkmon123
14 Comments
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096418
You might want to take a look at PHP cURL:

http://us2.php.net/curl

0
 
LVL 1

Author Comment

by:milkmon123
ID: 12096455
Hi

Looks excellent, but my virtual server is with DSVR (Designer Servers), and this extension cannot be installed.

Any other ideas guys ?

Thanks so far

Matt
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096528
Well, their script is a bit confusing. All of it is in one CGI so you're contacting that CGI twice if I get it straight?
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12096553
Yes indeed. The one CGI handles everything - from help pages, to data entry forms. What is output depends on what you pass to the CGI.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096652
This should be able to work since yours worked for step1 to step2. I'm betting the CGI is using param() to get the form fields so it shouldn't matter if it was POST or GET. Just as a question, you're attaching all POST parameters to the URL in fopen right?

(Of course, the simplest way would just be to install cURL. Can you ask Designer Servers to install cURL?)
0
 
LVL 4

Expert Comment

by:Skonen
ID: 12096662
Does the website handle get variables? Or is it just post variables?

I didn't really take an in depth look at your php, but I did notice one thing. You're using $fp  = @fopen($adr, r); as though r was a constant. This is safer:

$fp  = @fopen($adr, "rb");
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 1

Author Comment

by:milkmon123
ID: 12100461
Thanks for the input so far. I will check suggestions by Zyloch and Skonen - thanks - will update you asap.

Matt
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100541
Zyloch - cURL is now active on my server !!!

So, if I can use cURL, can you tell me what commands to utilise ?

Many thanks.

Thanks to Skonen for pointing out the "r" thing in fopen.
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12100565
Aiee... I have a bus to catch in 10 min or so, what bad timing. But check out this link, has some great examples:

http://curl.haxx.se/libcurl/php/examples/
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12517084
Hi

How do I close it without an answer accepted ?

I found the solution myself in the end: the post variables were padded out with spaces in the actual application, which obvioulsy I couldn't see on the screen when analysing the variables.

When duplicated with spaces included etc, it worked fine.

Many thanks

Milkmon123
0
 
LVL 21

Expert Comment

by:pinaldave
ID: 12556843
hello milkmon123,
I have asked for PAQ and refund so the question will be closed without accepting any answer as you have answered it yourself.
This Q will be stored in the database for future referance.
You do not have to do anything now. Thank you for coming back and explaining.
Regards,
---Pinal
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12582013
PAQed with points refunded (500)

modulo
Community Support Moderator
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
Why do we like using grid based layouts in website design? Let's look at the live examples of websites and compare them to grid based WordPress themes.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now