Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Screen Scrape - Passing Post Variables - Not Working

Posted on 2004-09-19
14
Medium Priority
?
285 Views
Last Modified: 2012-06-27
Hi

I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.

The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).

I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.

If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.

Here is the code I am using...

// Get outbound string... (at this point just gets first, not cheapest)...

      $start=strpos($string, "<img src=../images/icon_plane_rgt_ovr.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_outboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_outboundstring, "set_selected('1','")+18;

      $temp_outboundstring = substr($temp_outboundstring, $st2);

      $outboundstring = substr($temp_outboundstring, 0, 88);


// Now get inbound string...

      $start=strpos($string, "<img src=../images/icon_plane_lft_over.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_inboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_inboundstring, "set_selected('2','")+18;

      $temp_inboundstring = substr($temp_inboundstring, $st2);

      $inboundstring = substr($temp_inboundstring, 0, 88);


// Now get the final screen...
      $s="change_nom=1&CHILD=$child&language=EN&page=CONFIRM&";
      $s.="INFANT=$infant&module=SB&mode=0&ADULT=$adult&";
      $s.="px=&m2F=&m2=".$year_out.$month_out.$day_out."ALC".$airport."&";
      $s.="m1T=&m2T=&m1F=&m1=".$year_in.$month_in.$day_in.$airport."ALC"."&";
      $s.="nom=2&fare_cat=&tc=2&pM=0&";
      $s.="mkt1_selected=$outboundstring&";
      $s.="mkt2_selected=$inboundstring";

      $adr =  "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";

      $fp  = @fopen($adr, r);
      
      $start = time();
      
      $string="";
      while($str = fgets($fp, 1024)) {
            $string.=$str;
            if (time() > $start+50){
                  die("Timed out");
            }
      }
      fclose($fp);


This is probably irrelevant to the question mind you. However, it works for step 1 -> step 2, but step 2 -> step 3 it falls over, and the mytravellight web site gives an error.

I have checked the variables being sent, and replicated them, but it is not interested at all.

To check if it is possible, I have done a 'File -> Save As' when at step 2 on the web site, and saved the html to my desktop. I have then, in a separate session, double clicked it, selected the relevant flights, and clicked 'Submit' and this works fine and goes to Step 3.

I then set the 'action' of the form to point to a test .php file which loops through the post variables listing each one (like so...

<?php
print "<table>";
reset ($HTTP_POST_VARS);
while (list ($key, $val) = each ($HTTP_POST_VARS)) {
      if (($key != "Submit") && ($key != "recipient") && ($key != "subject")){
            print "<tr><td>$key</td><td>&nbsp;=&nbsp;</td><td>$val</td></tr>";
          $content=$content. "$key => $val\r\n";
          $html_content=$html_content. "$key => $val<br>";
      }
}
print "</table>";
?>

... I then replicated each of these on both a command line, AND as hidden fields in a form, and tried to make it work, getting to Step 3- but BOTH methods failed.

So, in summary...

There is no problem getting from step 2 to step 3 from an external site - but I cannot seem to find a way to do it using fget...

Can anyone help ?

Many thanks

Matt
0
Comment
Question by:milkmon123
12 Comments
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096418
You might want to take a look at PHP cURL:

http://us2.php.net/curl

0
 
LVL 1

Author Comment

by:milkmon123
ID: 12096455
Hi

Looks excellent, but my virtual server is with DSVR (Designer Servers), and this extension cannot be installed.

Any other ideas guys ?

Thanks so far

Matt
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096528
Well, their script is a bit confusing. All of it is in one CGI so you're contacting that CGI twice if I get it straight?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 1

Author Comment

by:milkmon123
ID: 12096553
Yes indeed. The one CGI handles everything - from help pages, to data entry forms. What is output depends on what you pass to the CGI.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096652
This should be able to work since yours worked for step1 to step2. I'm betting the CGI is using param() to get the form fields so it shouldn't matter if it was POST or GET. Just as a question, you're attaching all POST parameters to the URL in fopen right?

(Of course, the simplest way would just be to install cURL. Can you ask Designer Servers to install cURL?)
0
 
LVL 4

Expert Comment

by:Skonen
ID: 12096662
Does the website handle get variables? Or is it just post variables?

I didn't really take an in depth look at your php, but I did notice one thing. You're using $fp  = @fopen($adr, r); as though r was a constant. This is safer:

$fp  = @fopen($adr, "rb");
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100461
Thanks for the input so far. I will check suggestions by Zyloch and Skonen - thanks - will update you asap.

Matt
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100541
Zyloch - cURL is now active on my server !!!

So, if I can use cURL, can you tell me what commands to utilise ?

Many thanks.

Thanks to Skonen for pointing out the "r" thing in fopen.
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12100565
Aiee... I have a bus to catch in 10 min or so, what bad timing. But check out this link, has some great examples:

http://curl.haxx.se/libcurl/php/examples/
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12517084
Hi

How do I close it without an answer accepted ?

I found the solution myself in the end: the post variables were padded out with spaces in the actual application, which obvioulsy I couldn't see on the screen when analysing the variables.

When duplicated with spaces included etc, it worked fine.

Many thanks

Milkmon123
0
 
LVL 21

Expert Comment

by:pinaldave
ID: 12556843
hello milkmon123,
I have asked for PAQ and refund so the question will be closed without accepting any answer as you have answered it yourself.
This Q will be stored in the database for future referance.
You do not have to do anything now. Thank you for coming back and explaining.
Regards,
---Pinal
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12582013
PAQed with points refunded (500)

modulo
Community Support Moderator
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

CTAs encourage people to do something specific to show interest in your company, product or service. Keep reading to learn why CTAs should always be thought of as extremely important, albeit small, sections of websites.
Strategic internal linking is often considered an SEO power technique, especially for content marketing. Do you need to hire an SEO agency to optimize you internal linking? No, this article will help you understand the basics of internal linking and…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…
Suggested Courses
Course of the Month12 days, 9 hours left to enroll

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question