Solved

Screen Scrape - Passing Post Variables - Not Working

Posted on 2004-09-19
14
258 Views
Last Modified: 2012-06-27
Hi

I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.

The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).

I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.

If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.

Here is the code I am using...

// Get outbound string... (at this point just gets first, not cheapest)...

      $start=strpos($string, "<img src=../images/icon_plane_rgt_ovr.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_outboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_outboundstring, "set_selected('1','")+18;

      $temp_outboundstring = substr($temp_outboundstring, $st2);

      $outboundstring = substr($temp_outboundstring, 0, 88);


// Now get inbound string...

      $start=strpos($string, "<img src=../images/icon_plane_lft_over.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_inboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_inboundstring, "set_selected('2','")+18;

      $temp_inboundstring = substr($temp_inboundstring, $st2);

      $inboundstring = substr($temp_inboundstring, 0, 88);


// Now get the final screen...
      $s="change_nom=1&CHILD=$child&language=EN&page=CONFIRM&";
      $s.="INFANT=$infant&module=SB&mode=0&ADULT=$adult&";
      $s.="px=&m2F=&m2=".$year_out.$month_out.$day_out."ALC".$airport."&";
      $s.="m1T=&m2T=&m1F=&m1=".$year_in.$month_in.$day_in.$airport."ALC"."&";
      $s.="nom=2&fare_cat=&tc=2&pM=0&";
      $s.="mkt1_selected=$outboundstring&";
      $s.="mkt2_selected=$inboundstring";

      $adr =  "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";

      $fp  = @fopen($adr, r);
      
      $start = time();
      
      $string="";
      while($str = fgets($fp, 1024)) {
            $string.=$str;
            if (time() > $start+50){
                  die("Timed out");
            }
      }
      fclose($fp);


This is probably irrelevant to the question mind you. However, it works for step 1 -> step 2, but step 2 -> step 3 it falls over, and the mytravellight web site gives an error.

I have checked the variables being sent, and replicated them, but it is not interested at all.

To check if it is possible, I have done a 'File -> Save As' when at step 2 on the web site, and saved the html to my desktop. I have then, in a separate session, double clicked it, selected the relevant flights, and clicked 'Submit' and this works fine and goes to Step 3.

I then set the 'action' of the form to point to a test .php file which loops through the post variables listing each one (like so...

<?php
print "<table>";
reset ($HTTP_POST_VARS);
while (list ($key, $val) = each ($HTTP_POST_VARS)) {
      if (($key != "Submit") && ($key != "recipient") && ($key != "subject")){
            print "<tr><td>$key</td><td>&nbsp;=&nbsp;</td><td>$val</td></tr>";
          $content=$content. "$key => $val\r\n";
          $html_content=$html_content. "$key => $val<br>";
      }
}
print "</table>";
?>

... I then replicated each of these on both a command line, AND as hidden fields in a form, and tried to make it work, getting to Step 3- but BOTH methods failed.

So, in summary...

There is no problem getting from step 2 to step 3 from an external site - but I cannot seem to find a way to do it using fget...

Can anyone help ?

Many thanks

Matt
0
Comment
Question by:milkmon123
14 Comments
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096418
You might want to take a look at PHP cURL:

http://us2.php.net/curl

0
 
LVL 1

Author Comment

by:milkmon123
ID: 12096455
Hi

Looks excellent, but my virtual server is with DSVR (Designer Servers), and this extension cannot be installed.

Any other ideas guys ?

Thanks so far

Matt
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096528
Well, their script is a bit confusing. All of it is in one CGI so you're contacting that CGI twice if I get it straight?
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12096553
Yes indeed. The one CGI handles everything - from help pages, to data entry forms. What is output depends on what you pass to the CGI.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096652
This should be able to work since yours worked for step1 to step2. I'm betting the CGI is using param() to get the form fields so it shouldn't matter if it was POST or GET. Just as a question, you're attaching all POST parameters to the URL in fopen right?

(Of course, the simplest way would just be to install cURL. Can you ask Designer Servers to install cURL?)
0
 
LVL 4

Expert Comment

by:Skonen
ID: 12096662
Does the website handle get variables? Or is it just post variables?

I didn't really take an in depth look at your php, but I did notice one thing. You're using $fp  = @fopen($adr, r); as though r was a constant. This is safer:

$fp  = @fopen($adr, "rb");
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 1

Author Comment

by:milkmon123
ID: 12100461
Thanks for the input so far. I will check suggestions by Zyloch and Skonen - thanks - will update you asap.

Matt
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100541
Zyloch - cURL is now active on my server !!!

So, if I can use cURL, can you tell me what commands to utilise ?

Many thanks.

Thanks to Skonen for pointing out the "r" thing in fopen.
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12100565
Aiee... I have a bus to catch in 10 min or so, what bad timing. But check out this link, has some great examples:

http://curl.haxx.se/libcurl/php/examples/
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12517084
Hi

How do I close it without an answer accepted ?

I found the solution myself in the end: the post variables were padded out with spaces in the actual application, which obvioulsy I couldn't see on the screen when analysing the variables.

When duplicated with spaces included etc, it worked fine.

Many thanks

Milkmon123
0
 
LVL 21

Expert Comment

by:pinaldave
ID: 12556843
hello milkmon123,
I have asked for PAQ and refund so the question will be closed without accepting any answer as you have answered it yourself.
This Q will be stored in the database for future referance.
You do not have to do anything now. Thank you for coming back and explaining.
Regards,
---Pinal
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12582013
PAQed with points refunded (500)

modulo
Community Support Moderator
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why do we like using grid based layouts in website design? Let's look at the live examples of websites and compare them to grid based WordPress themes.
Get to know the ins and outs of building a web-based ERP system for your enterprise. Development timeline, technology, and costs outlined.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

932 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now