Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Screen Scrape - Passing Post Variables - Not Working

Posted on 2004-09-19
14
Medium Priority
?
280 Views
Last Modified: 2012-06-27
Hi

I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.

The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).

I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.

If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.

Here is the code I am using...

// Get outbound string... (at this point just gets first, not cheapest)...

      $start=strpos($string, "<img src=../images/icon_plane_rgt_ovr.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_outboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_outboundstring, "set_selected('1','")+18;

      $temp_outboundstring = substr($temp_outboundstring, $st2);

      $outboundstring = substr($temp_outboundstring, 0, 88);


// Now get inbound string...

      $start=strpos($string, "<img src=../images/icon_plane_lft_over.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_inboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_inboundstring, "set_selected('2','")+18;

      $temp_inboundstring = substr($temp_inboundstring, $st2);

      $inboundstring = substr($temp_inboundstring, 0, 88);


// Now get the final screen...
      $s="change_nom=1&CHILD=$child&language=EN&page=CONFIRM&";
      $s.="INFANT=$infant&module=SB&mode=0&ADULT=$adult&";
      $s.="px=&m2F=&m2=".$year_out.$month_out.$day_out."ALC".$airport."&";
      $s.="m1T=&m2T=&m1F=&m1=".$year_in.$month_in.$day_in.$airport."ALC"."&";
      $s.="nom=2&fare_cat=&tc=2&pM=0&";
      $s.="mkt1_selected=$outboundstring&";
      $s.="mkt2_selected=$inboundstring";

      $adr =  "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";

      $fp  = @fopen($adr, r);
      
      $start = time();
      
      $string="";
      while($str = fgets($fp, 1024)) {
            $string.=$str;
            if (time() > $start+50){
                  die("Timed out");
            }
      }
      fclose($fp);


This is probably irrelevant to the question mind you. However, it works for step 1 -> step 2, but step 2 -> step 3 it falls over, and the mytravellight web site gives an error.

I have checked the variables being sent, and replicated them, but it is not interested at all.

To check if it is possible, I have done a 'File -> Save As' when at step 2 on the web site, and saved the html to my desktop. I have then, in a separate session, double clicked it, selected the relevant flights, and clicked 'Submit' and this works fine and goes to Step 3.

I then set the 'action' of the form to point to a test .php file which loops through the post variables listing each one (like so...

<?php
print "<table>";
reset ($HTTP_POST_VARS);
while (list ($key, $val) = each ($HTTP_POST_VARS)) {
      if (($key != "Submit") && ($key != "recipient") && ($key != "subject")){
            print "<tr><td>$key</td><td>&nbsp;=&nbsp;</td><td>$val</td></tr>";
          $content=$content. "$key => $val\r\n";
          $html_content=$html_content. "$key => $val<br>";
      }
}
print "</table>";
?>

... I then replicated each of these on both a command line, AND as hidden fields in a form, and tried to make it work, getting to Step 3- but BOTH methods failed.

So, in summary...

There is no problem getting from step 2 to step 3 from an external site - but I cannot seem to find a way to do it using fget...

Can anyone help ?

Many thanks

Matt
0
Comment
Question by:milkmon123
14 Comments
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096418
You might want to take a look at PHP cURL:

http://us2.php.net/curl

0
 
LVL 1

Author Comment

by:milkmon123
ID: 12096455
Hi

Looks excellent, but my virtual server is with DSVR (Designer Servers), and this extension cannot be installed.

Any other ideas guys ?

Thanks so far

Matt
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096528
Well, their script is a bit confusing. All of it is in one CGI so you're contacting that CGI twice if I get it straight?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 1

Author Comment

by:milkmon123
ID: 12096553
Yes indeed. The one CGI handles everything - from help pages, to data entry forms. What is output depends on what you pass to the CGI.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096652
This should be able to work since yours worked for step1 to step2. I'm betting the CGI is using param() to get the form fields so it shouldn't matter if it was POST or GET. Just as a question, you're attaching all POST parameters to the URL in fopen right?

(Of course, the simplest way would just be to install cURL. Can you ask Designer Servers to install cURL?)
0
 
LVL 4

Expert Comment

by:Skonen
ID: 12096662
Does the website handle get variables? Or is it just post variables?

I didn't really take an in depth look at your php, but I did notice one thing. You're using $fp  = @fopen($adr, r); as though r was a constant. This is safer:

$fp  = @fopen($adr, "rb");
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100461
Thanks for the input so far. I will check suggestions by Zyloch and Skonen - thanks - will update you asap.

Matt
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100541
Zyloch - cURL is now active on my server !!!

So, if I can use cURL, can you tell me what commands to utilise ?

Many thanks.

Thanks to Skonen for pointing out the "r" thing in fopen.
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12100565
Aiee... I have a bus to catch in 10 min or so, what bad timing. But check out this link, has some great examples:

http://curl.haxx.se/libcurl/php/examples/
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12517084
Hi

How do I close it without an answer accepted ?

I found the solution myself in the end: the post variables were padded out with spaces in the actual application, which obvioulsy I couldn't see on the screen when analysing the variables.

When duplicated with spaces included etc, it worked fine.

Many thanks

Milkmon123
0
 
LVL 21

Expert Comment

by:pinaldave
ID: 12556843
hello milkmon123,
I have asked for PAQ and refund so the question will be closed without accepting any answer as you have answered it yourself.
This Q will be stored in the database for future referance.
You do not have to do anything now. Thank you for coming back and explaining.
Regards,
---Pinal
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12582013
PAQed with points refunded (500)

modulo
Community Support Moderator
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Without even knowing it, most of us are using web applications on a daily basis.  In fact, Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We generally confuse these web applications to…
The first step to building an amazing About page is to figure out what you want the page to say about your company. You then must grab the attention of the reader, boast a bit, tell a story and let others brag about you. With a little bit of thought…
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

971 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question