Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

grab a portion of remote webpage

Posted on 2003-11-13
4
646 Views
Last Modified: 2012-08-14
I know this question has been asked a billion times by now, but I can't find an answer.
I've already done what I need in PHP, but I need it in Perl so I can run it from cron.
I need to create a text file that contains the values used on a dropdown menu on a remote
website. The values sporadically change, so this would be ran once a week, to ensure we
have a correct list.  

#!/usr/bin/perl
use LWP::Simple;
my $source=get('http://somedomain.com/file.php?var=var');

if($source){
# I only need a portion of the page, everything between the
# 1st set of SELECT tags
##  <SELECT name="dropdown"> <OPTION value="All" selected>All
##  <OPTION value="1">Option 1
##   this is a really long list
##  <OPTION value="255">Option 255
##  </SELECT>
 print "$source"; } else {die "$!";}

How can I get $source to only contain only the portion I need?
0
Comment
Question by:dewed
4 Comments
 
LVL 18

Expert Comment

by:kandura
ID: 9744165
I suggest you use HTML::TreeBuilder or one of the many other HTML parsing modules that come with Perl.

Do something like:

$tree = HTML::TreeBuilder->new_from_content($source);
$select = $tree->look_down('_tag', 'select');
print $select->as_HTML;

This will print the first SELECT on the page (with the <select> tag; loop over $select->content_list do print only the options).

See the documentation for HTML::TreeBuilder and HTML::Element for details.
0
 
LVL 3

Assisted Solution

by:prady_21
prady_21 earned 25 total points
ID: 9745388
#!/usr/bin/perl
### A program using sockets

use IO::Socket;

$HOST="www.yoursite.com";
$URL_VAL="path/to/the/page";

$sock = new IO::Socket::INET ( PeerAddr  => "${HOST}",
                               PeerPort  =>  80,
                               Proto     => 'tcp',
                               Timeout   => 10,
                             );
die "Socket could not be created $!\n" unless $sock;

     print $sock "GET ${URL_VAL} HTTP/1.0\r\n";
     print $sock "Host: ${HOST}\r\n";
     print $sock "Accept: */*\r\n";
     print $sock "Connection: Keep-Alive\n\n";
     while($line = <$sock>) {
       if ( $line =~ /<SELECT name="dropdown">/ ) {
          until ( ($line = <$sock>) =~ m/<\/SELECT>/ ) {
             $text .= $line;
          }
          last;
       }
     }
     print "$text\n";
  exit 0;
0
 
LVL 1

Accepted Solution

by:
OKSD earned 25 total points
ID: 9749535
Does the cron not work with PHP?

-OKSD
0
 

Author Comment

by:dewed
ID: 9751956
Does the cron not work with PHP?
.. ya know.. I don't know.. I haven't tried command line php since we were upgraded.. maybe they turned it on this time.

hehe.. cool!  couldn't run PHP command line before  
#!/usr/bin/php -q
<?php
print "hello world";
?>
shouldn't be a problem now  thanks!
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question