Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Screen Scrape - Passing Post Variables - Not Working

Posted on 2004-09-19
14
260 Views
Last Modified: 2012-06-27
Hi

I am doing the back end programming of a web site which will check airline prices from an external web site using 'screen scraping'.

The web site we are accessing is www.mytravellight.com (we have their permission to access the site, as the company we are writing the system for will be booking the flights through them anyway).

I have written the system to enter number of passengers and departure / destination airports fine, and the system works and responds well, but when I get to the bit where we actually select the flight from a list in order to confirm airport taxes etc, it will not allow me to do it.

If you have a look at the web site (www.mytravellight.com) - and enquire about a flight from Birmingham to Alicante (for example), when you click 'Search' you get to Step 2 (select your outgoing and incoming flights) - this works fine. It is the step 3 that I cannot get to work - moving from step 2 to step 3.

Here is the code I am using...

// Get outbound string... (at this point just gets first, not cheapest)...

      $start=strpos($string, "<img src=../images/icon_plane_rgt_ovr.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_outboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_outboundstring, "set_selected('1','")+18;

      $temp_outboundstring = substr($temp_outboundstring, $st2);

      $outboundstring = substr($temp_outboundstring, 0, 88);


// Now get inbound string...

      $start=strpos($string, "<img src=../images/icon_plane_lft_over.gif");

      $pos=revsearch($string, "<tr>", $start);

      $endofline = strpos($string, "<td colspan=5 >", $pos);

      $linelength = $endofline - $pos;

      $temp_inboundstring = substr($string, $pos, $linelength);

      $st2=strpos($temp_inboundstring, "set_selected('2','")+18;

      $temp_inboundstring = substr($temp_inboundstring, $st2);

      $inboundstring = substr($temp_inboundstring, 0, 88);


// Now get the final screen...
      $s="change_nom=1&CHILD=$child&language=EN&page=CONFIRM&";
      $s.="INFANT=$infant&module=SB&mode=0&ADULT=$adult&";
      $s.="px=&m2F=&m2=".$year_out.$month_out.$day_out."ALC".$airport."&";
      $s.="m1T=&m2T=&m1F=&m1=".$year_in.$month_in.$day_in.$airport."ALC"."&";
      $s.="nom=2&fare_cat=&tc=2&pM=0&";
      $s.="mkt1_selected=$outboundstring&";
      $s.="mkt2_selected=$inboundstring";

      $adr =  "http://mytravellite.com/skylights/cgi-bin/skylights.cgi?$s";

      $fp  = @fopen($adr, r);
      
      $start = time();
      
      $string="";
      while($str = fgets($fp, 1024)) {
            $string.=$str;
            if (time() > $start+50){
                  die("Timed out");
            }
      }
      fclose($fp);


This is probably irrelevant to the question mind you. However, it works for step 1 -> step 2, but step 2 -> step 3 it falls over, and the mytravellight web site gives an error.

I have checked the variables being sent, and replicated them, but it is not interested at all.

To check if it is possible, I have done a 'File -> Save As' when at step 2 on the web site, and saved the html to my desktop. I have then, in a separate session, double clicked it, selected the relevant flights, and clicked 'Submit' and this works fine and goes to Step 3.

I then set the 'action' of the form to point to a test .php file which loops through the post variables listing each one (like so...

<?php
print "<table>";
reset ($HTTP_POST_VARS);
while (list ($key, $val) = each ($HTTP_POST_VARS)) {
      if (($key != "Submit") && ($key != "recipient") && ($key != "subject")){
            print "<tr><td>$key</td><td>&nbsp;=&nbsp;</td><td>$val</td></tr>";
          $content=$content. "$key => $val\r\n";
          $html_content=$html_content. "$key => $val<br>";
      }
}
print "</table>";
?>

... I then replicated each of these on both a command line, AND as hidden fields in a form, and tried to make it work, getting to Step 3- but BOTH methods failed.

So, in summary...

There is no problem getting from step 2 to step 3 from an external site - but I cannot seem to find a way to do it using fget...

Can anyone help ?

Many thanks

Matt
0
Comment
Question by:milkmon123
14 Comments
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096418
You might want to take a look at PHP cURL:

http://us2.php.net/curl

0
 
LVL 1

Author Comment

by:milkmon123
ID: 12096455
Hi

Looks excellent, but my virtual server is with DSVR (Designer Servers), and this extension cannot be installed.

Any other ideas guys ?

Thanks so far

Matt
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096528
Well, their script is a bit confusing. All of it is in one CGI so you're contacting that CGI twice if I get it straight?
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 1

Author Comment

by:milkmon123
ID: 12096553
Yes indeed. The one CGI handles everything - from help pages, to data entry forms. What is output depends on what you pass to the CGI.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12096652
This should be able to work since yours worked for step1 to step2. I'm betting the CGI is using param() to get the form fields so it shouldn't matter if it was POST or GET. Just as a question, you're attaching all POST parameters to the URL in fopen right?

(Of course, the simplest way would just be to install cURL. Can you ask Designer Servers to install cURL?)
0
 
LVL 4

Expert Comment

by:Skonen
ID: 12096662
Does the website handle get variables? Or is it just post variables?

I didn't really take an in depth look at your php, but I did notice one thing. You're using $fp  = @fopen($adr, r); as though r was a constant. This is safer:

$fp  = @fopen($adr, "rb");
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100461
Thanks for the input so far. I will check suggestions by Zyloch and Skonen - thanks - will update you asap.

Matt
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12100541
Zyloch - cURL is now active on my server !!!

So, if I can use cURL, can you tell me what commands to utilise ?

Many thanks.

Thanks to Skonen for pointing out the "r" thing in fopen.
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 12100565
Aiee... I have a bus to catch in 10 min or so, what bad timing. But check out this link, has some great examples:

http://curl.haxx.se/libcurl/php/examples/
0
 
LVL 1

Author Comment

by:milkmon123
ID: 12517084
Hi

How do I close it without an answer accepted ?

I found the solution myself in the end: the post variables were padded out with spaces in the actual application, which obvioulsy I couldn't see on the screen when analysing the variables.

When duplicated with spaces included etc, it worked fine.

Many thanks

Milkmon123
0
 
LVL 21

Expert Comment

by:pinaldave
ID: 12556843
hello milkmon123,
I have asked for PAQ and refund so the question will be closed without accepting any answer as you have answered it yourself.
This Q will be stored in the database for future referance.
You do not have to do anything now. Thank you for coming back and explaining.
Regards,
---Pinal
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12582013
PAQed with points refunded (500)

modulo
Community Support Moderator
0

Featured Post

Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
move widget title down 4 25
tomcat startup error 5 102
ASP.NET(C#) Eliminating weekends from a date range 2 45
CSS: How do I override in-line styling 11 25
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
This video teaches users how to migrate an existing Wordpress website to a new domain.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question