Link to home
Start Free TrialLog in
Avatar of elpmet
elpmet

asked on

How can I retrieve a web site from Perl script...?

I'd like to write a client script which retrieves a web page into a variable at a specified time every day, without using module supports for socket (such as net, etc).

Let's say I want to retrieve, for instance, www.yahoo.com/index.html into $yahoo at 21:00 every day.  

Please write me a program as a head start, which connects to www.yahoo.com at port 80, sends "GET /index.html HTTP/1.0" (& "\n\n"), and puts the result into $yahoo, at, say, 21:00 every day.

I will process content of $yahoo and save them to a file from there.  Please don't use help modules.

Thanks,

Elpmet
Avatar of Alien Life-Form
Alien Life-Form

Greetings.

IO::Socket is in the perl core.
Periodic execution is an Operating System issue (as opposed to a perl issue). On U*X, look at the cron/at man pages; on windoze, (NT) has equivalent facilities (9x is next to hopeless, as usual)

use IO::Socket;
$remote = IO::Socket::INET->new(
   Proto => "tcp",
   PeerAddr => "www.yahoo.com",
   PeerPort  => "80",
);
print $remote "GET /index.html HTTP/1.0\n\n";

while(<$remote>) {
    if (/^content-length:/i) {
        chop;
        s/content-length:\s*//g;
        s/\s//g;
        $cl=$_;
    }
    last if(/^$/);
}

if($cl) {
   read($remote,$yahoo,$cl);
}
#
# Do stuff to $yahoo.
.....

(P.S. : is this a home assignment? :->)


elpmet,
any particular reason you dont want to use the Net module??

Avatar of elpmet

ASKER

Hi, alien_life_form, and maneshr,

Thank you for your help
No, this is not a class assignment (graduated already), but my personal interest to learn perl socket and make my life easier (in fact, I'm writing a program for myself to automate daily calculations and performance-checking of my mutual fund account based on the NAV value published daily on their website, so I can check it no matter where I am).

My experience with PERL is very limited, but I don't like to use module support for sockets or utility packages just because I'm curious to learn the details (using socket, bind and connect), and I'm reluctant to learn and heavily rely on modules which may not be available under a certain environment.

alien_life_form,
I really appreciate the code you wrote for me, but my perl gave me the following error message upon running it.

Can't locate IO/Socket.pm in @INC at test.cgi line 3.
BEGIN failed--compilation aborted at test.cgi line 3.


Is it a lot of work to write such a script without using IO module, but only with fundamental functions?

Thank you for your help!


Elpmet
"....(In fact, I'm writing a program for myself to automate daily calculations and
performance-checking of my mutual fund account based on the NAV value published daily on their
website, so I can check it no matter where I am)....."

i had faced a similar situation in the past. i wanted to view the top news headlines from 4 big sites.
 implemented the solution using the Net module. i was able to get the headlines from the different
sites either via email or in the browser or stored as a file.

"....just because I'm curious to learn the details...."

i think that is a very good attitude to have.

"....I'm reluctant to learn and heavily rely on modules which may not be available under a certain environment. ...."

i think you are not totally right here. PERL modules are portable across platfroms to a large extent.
In fact the script that i have can be used on NT too, with very minor modifications.

i would suggest that you get a working solution to your problem first & then
focus on dissecting the solution to learn what is happenning under the hood.
Besides, if you use PERL modules, you will have to only focus on your solution,
rather than having to re-invent the wheel.

However, the final choice is yours.
if you feel you want to try the module approach i can post the code here.

The error ,Can't locate IO/Socket.pm in @INC at test.cgi line 3.

that you have can be due to 2 reasons.

1) you dont have the IO PERL module installed on your system OR
2) the module is installed, but the path to that module is not know to PERL.


Greetings.


Mmmhhh.... two mistakes in a single post. (IO::Socket not in the perl core, and not a home assignment - sorry about both)

Using straight sockets does not cause big work, just decreased readability, and cruftier code. Since most of the interaction is heavily stereotyped anyway - unless you're going to diddle with the lower levels  of the socket internals that is - wrapper modules hide the ugliness and give a friendlier semantic. By throwing Net, LWP and company in the picture, you also buy support for things like proxies.

However, with straight sockets the above example becomes ( Socket being in the perl core. I checked the docs this time...):

#!/usr/bin/perl -w
use strict;
use Socket;
my ($remote,$port, $iaddr, $paddr, $proto, $line);

$remote  = 'www.yahoo.com';
$port = getservbyname('http', 'tcp');
$port=80 unless $port;
$iaddr   = inet_aton($remote) or die "no host: $remote";
$paddr   = sockaddr_in($port, $iaddr);

$proto   = getprotobyname('tcp');
socket(SOCK, PF_INET, SOCK_STREAM, $proto)  or die "socket: $!";
connect(SOCK, $paddr) or  die "connect: $!";


print SOCK "GET /index.html HTTP/1.0\n\n";
while(<SOCK>) {
    if (/^content-length:/i) {
        chop;
        s/content-length:\s*//g;
        s/\s//g;
        $cl=$_;
    }
    last if(/^$/);
}

if($cl) {
   read($remote,$yahoo,$cl);
}
close (SOCK) or die "close: $!";
#
# Do stuff to $yahoo.
......

exit;


Use Socket does nothing more than importing the fundamental constants (PF_INET...) and macros (inet_aton &C.) that in older code are sometimes seen as numerical constants and pack/unpack operations (non-portable not to mention even uglier).
The incantations beginning with getservbyname and ending with connect replace the call to IO::Socket::INET->new, without any visible or actual improvement besides keyboard wear :-)

If you are interested in pursuing this subjct further, perlipc (in the docs) has much to say about it.

Cheers,
    alf

Avatar of elpmet

ASKER

Adjusted points from 100 to 200
Avatar of elpmet

ASKER

Hi folks,

Thank you for your replies!

I agree with you to some extent, maneshr, to get a working solution first as you suggested, but I also see this as a good opportunity for me to force myself to learn the nuts & bolts of how perl socket works.

alien_life_form,
Thank you again for re-writing the code for me.  Yet, the script seems to go into an infinite loop in the while statement on my system using perl version 5.004_04, which I cannot control.

I also receive the following error messages on a different machine with perl version 5.001 (which I cannot control, either).

Undefined subroutine &main::inet_aton called at test.cgi line 14.
Undefined subroutine &main::sockaddr_in called at test.cgi line 15

Can we write only with perl-original functions to replace sockaddr_in and inet_aton?  (I really prefer solutions under 5.001)  Also, do we need to use bind(), or the above functions takes care of it instead?

I'm sorry to be bothering you, but I would be greatful if you could put up with me.

I will increase points to 200.

Elpmet

ASKER CERTIFIED SOLUTION
Avatar of Alien Life-Form
Alien Life-Form

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of elpmet

ASKER

Hi folks,

Thank you for your replies!

I agree with you to some extent, maneshr, to get a working solution first as you suggested, but I also see this as a good opportunity for me to force myself to learn the nuts & bolts of how perl socket works.

alien_life_form,
Thank you again for re-writing the code for me.  Yet, the script seems to go into an infinite loop in the while statement on my system using perl version 5.004_04, which I cannot control.

I also receive the following error messages on a different machine with perl version 5.001 (which I cannot control, either).

Undefined subroutine &main::inet_aton called at test.cgi line 14.
Undefined subroutine &main::sockaddr_in called at test.cgi line 15

Can we write only with perl-original functions to replace sockaddr_in and inet_aton?  (I really prefer solutions under 5.001)  Also, do we need to use bind(), or the above functions takes care of it instead?

I'm sorry to be bothering you, but I would be greatful if you could put up with me.

I will increase points to 200.

Elpmet

..and, in the above code, I changed www.yahoo.com to localhost for testing purposes...
Avatar of elpmet

ASKER

alien_life_form,

Thank you for your prompt response and explanation!  I will run your script this afternoon and come back to you.

I am sorry for my previous double posting.  (It was my right elbow that hit the enter key by accident)

Elpmet
Avatar of elpmet

ASKER

alien_life_form,

Your script (under 5.005) works perfectly!!  Thank you!  I have already sent an email to the sysadmin to upgrade their old perl.

I will study your script tonight by playing with it a lot, and start modifying my own script.

As for your second script, I am still interested in examining it all over.  Your example is good enough as a head start for me.

Thank you for giving me this excellent opportunity to learn socket in perl.  I also would like to thank you, maneshr, for your suggestions and information!

Elpmet.

Wish you much success with your learning PERL. :-)