Solved

How can I retrieve a web site from Perl script...?

Posted on 2000-05-01
13
377 Views
Last Modified: 2008-03-17
I'd like to write a client script which retrieves a web page into a variable at a specified time every day, without using module supports for socket (such as net, etc).

Let's say I want to retrieve, for instance, www.yahoo.com/index.html into $yahoo at 21:00 every day.  

Please write me a program as a head start, which connects to www.yahoo.com at port 80, sends "GET /index.html HTTP/1.0" (& "\n\n"), and puts the result into $yahoo, at, say, 21:00 every day.

I will process content of $yahoo and save them to a file from there.  Please don't use help modules.

Thanks,

Elpmet
0
Comment
Question by:elpmet
  • 6
  • 4
  • 3
13 Comments
 
LVL 3

Expert Comment

by:alien_life_form
ID: 2768620
Greetings.

IO::Socket is in the perl core.
Periodic execution is an Operating System issue (as opposed to a perl issue). On U*X, look at the cron/at man pages; on windoze, (NT) has equivalent facilities (9x is next to hopeless, as usual)

use IO::Socket;
$remote = IO::Socket::INET->new(
   Proto => "tcp",
   PeerAddr => "www.yahoo.com",
   PeerPort  => "80",
);
print $remote "GET /index.html HTTP/1.0\n\n";

while(<$remote>) {
    if (/^content-length:/i) {
        chop;
        s/content-length:\s*//g;
        s/\s//g;
        $cl=$_;
    }
    last if(/^$/);
}

if($cl) {
   read($remote,$yahoo,$cl);
}
#
# Do stuff to $yahoo.
.....

(P.S. : is this a home assignment? :->)


0
 
LVL 16

Expert Comment

by:maneshr
ID: 2769383
elpmet,
any particular reason you dont want to use the Net module??

0
 
LVL 1

Author Comment

by:elpmet
ID: 2769859
Hi, alien_life_form, and maneshr,

Thank you for your help
No, this is not a class assignment (graduated already), but my personal interest to learn perl socket and make my life easier (in fact, I'm writing a program for myself to automate daily calculations and performance-checking of my mutual fund account based on the NAV value published daily on their website, so I can check it no matter where I am).

My experience with PERL is very limited, but I don't like to use module support for sockets or utility packages just because I'm curious to learn the details (using socket, bind and connect), and I'm reluctant to learn and heavily rely on modules which may not be available under a certain environment.

alien_life_form,
I really appreciate the code you wrote for me, but my perl gave me the following error message upon running it.

Can't locate IO/Socket.pm in @INC at test.cgi line 3.
BEGIN failed--compilation aborted at test.cgi line 3.


Is it a lot of work to write such a script without using IO module, but only with fundamental functions?

Thank you for your help!


Elpmet
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2769948
"....(In fact, I'm writing a program for myself to automate daily calculations and
performance-checking of my mutual fund account based on the NAV value published daily on their
website, so I can check it no matter where I am)....."

i had faced a similar situation in the past. i wanted to view the top news headlines from 4 big sites.
 implemented the solution using the Net module. i was able to get the headlines from the different
sites either via email or in the browser or stored as a file.

"....just because I'm curious to learn the details...."

i think that is a very good attitude to have.

"....I'm reluctant to learn and heavily rely on modules which may not be available under a certain environment. ...."

i think you are not totally right here. PERL modules are portable across platfroms to a large extent.
In fact the script that i have can be used on NT too, with very minor modifications.

i would suggest that you get a working solution to your problem first & then
focus on dissecting the solution to learn what is happenning under the hood.
Besides, if you use PERL modules, you will have to only focus on your solution,
rather than having to re-invent the wheel.

However, the final choice is yours.
if you feel you want to try the module approach i can post the code here.

The error ,Can't locate IO/Socket.pm in @INC at test.cgi line 3.

that you have can be due to 2 reasons.

1) you dont have the IO PERL module installed on your system OR
2) the module is installed, but the path to that module is not know to PERL.


0
 
LVL 3

Expert Comment

by:alien_life_form
ID: 2769975
Greetings.


Mmmhhh.... two mistakes in a single post. (IO::Socket not in the perl core, and not a home assignment - sorry about both)

Using straight sockets does not cause big work, just decreased readability, and cruftier code. Since most of the interaction is heavily stereotyped anyway - unless you're going to diddle with the lower levels  of the socket internals that is - wrapper modules hide the ugliness and give a friendlier semantic. By throwing Net, LWP and company in the picture, you also buy support for things like proxies.

However, with straight sockets the above example becomes ( Socket being in the perl core. I checked the docs this time...):

#!/usr/bin/perl -w
use strict;
use Socket;
my ($remote,$port, $iaddr, $paddr, $proto, $line);

$remote  = 'www.yahoo.com';
$port = getservbyname('http', 'tcp');
$port=80 unless $port;
$iaddr   = inet_aton($remote) or die "no host: $remote";
$paddr   = sockaddr_in($port, $iaddr);

$proto   = getprotobyname('tcp');
socket(SOCK, PF_INET, SOCK_STREAM, $proto)  or die "socket: $!";
connect(SOCK, $paddr) or  die "connect: $!";


print SOCK "GET /index.html HTTP/1.0\n\n";
while(<SOCK>) {
    if (/^content-length:/i) {
        chop;
        s/content-length:\s*//g;
        s/\s//g;
        $cl=$_;
    }
    last if(/^$/);
}

if($cl) {
   read($remote,$yahoo,$cl);
}
close (SOCK) or die "close: $!";
#
# Do stuff to $yahoo.
......

exit;


Use Socket does nothing more than importing the fundamental constants (PF_INET...) and macros (inet_aton &C.) that in older code are sometimes seen as numerical constants and pack/unpack operations (non-portable not to mention even uglier).
The incantations beginning with getservbyname and ending with connect replace the call to IO::Socket::INET->new, without any visible or actual improvement besides keyboard wear :-)

If you are interested in pursuing this subjct further, perlipc (in the docs) has much to say about it.

Cheers,
    alf

0
 
LVL 1

Author Comment

by:elpmet
ID: 2773945
Adjusted points from 100 to 200
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 1

Author Comment

by:elpmet
ID: 2773946
Hi folks,

Thank you for your replies!

I agree with you to some extent, maneshr, to get a working solution first as you suggested, but I also see this as a good opportunity for me to force myself to learn the nuts & bolts of how perl socket works.

alien_life_form,
Thank you again for re-writing the code for me.  Yet, the script seems to go into an infinite loop in the while statement on my system using perl version 5.004_04, which I cannot control.

I also receive the following error messages on a different machine with perl version 5.001 (which I cannot control, either).

Undefined subroutine &main::inet_aton called at test.cgi line 14.
Undefined subroutine &main::sockaddr_in called at test.cgi line 15

Can we write only with perl-original functions to replace sockaddr_in and inet_aton?  (I really prefer solutions under 5.001)  Also, do we need to use bind(), or the above functions takes care of it instead?

I'm sorry to be bothering you, but I would be greatful if you could put up with me.

I will increase points to 200.

Elpmet

0
 
LVL 3

Accepted Solution

by:
alien_life_form earned 200 total points
ID: 2774253
Greetings.
It is not looping. It hangs because I forgot to autoflush the socket - that's what the
select(sock); $|=1

below does. So,
fixing another couple of typos, and doing explicit handling of line terminators, we get the following 5.005 working program:

#!/usr/bin/perl -w
use strict;
use Socket;
my ($remote,$port, $iaddr, $paddr, $proto, $line);
my($cl,$yahoo);
$|=1;

$remote  = 'localhost';
$port = getservbyname('http', 'tcp');
$port=80 unless $port;
$iaddr   = inet_aton($remote) or die "no host: $remote";
$paddr   = sockaddr_in($port, $iaddr);

$proto   = getprotobyname('tcp');
socket(SOCK, PF_INET, SOCK_STREAM, $proto)  or die "socket: $!";
connect(SOCK, $paddr) or  die "connect: $!";

select(SOCK) ; $|=1; select STDOUT;

print SOCK "GET /index.html HTTP/1.0\r\n\r\n";
while(<SOCK>) {
    print STDERR "Read:$_\n";
    s/\r\n//g;
    if (/^content-length:/i) {
        s/content-length:\s*//ig;
        s/\s//g;
        $cl=$_;
    }
    last if(/^$/);
}

print STDERR "CL is $cl\n";
if($cl) {
   read(SOCK,$yahoo,$cl);
}
close (SOCK) or die "close: $!";
#
# Do stuff to $yahoo.
print STDERR "****Gotten:\n";
print STDOUT $yahoo;

exit;


You may notice that I'm now forced to do by hand lots of things that IO::Socket did behind the scene, including the pesky bits of autoflushinga and line termination...

As for your other questions: I think I remember that pre 5.003 perl versions were rather buggy, and I have a faint memory of 5.001  being particularly bad to the point where using it was basically deprecated - so you should definetly demand that it be upgraded.

I am not sure about bind.
It is needed on server (listen) sockets. Old style code has it also in client sockets, but I forgot why. The code above works with no bind. Since bind links (binds) the local address to the socket, I assume this is implicitely done by the modern socket implementations (unbound sockets are not extremely useful).


As for old style, quoting directly from perlipc:

"One of the major problems with old socket code in Perl was that it used hard-coded values for some of the constants, which severely hurt portability. If you ever see code that does anything like explicitly setting $AF_INET = 2, you know you're in for big trouble: An immeasurably superior approach is to use the Socket module, which more reliably grants access to various constants and functions you'll need. ..."

So you should not do it. Ask all the folks that relied on $AF_INET being 2, up to the point when it was changed to some other number in Solaris 2.3 ...
But, since you asked, this is an old snippet of code that does all the bad things mentioned above (it is not mine: I think it comes from Ping.pm, or even ping.pl)
It will need tailoring to fit your needs.




$TO = 'www.yahoo.com';

$AF_INET=2;
$SOCK_STREAM=1;
$sockaddr='S n a4 x8';

($name,$aliases,$proto)=getprotobyname('tcp');
($name,$aliases,$port)=getservbyname('echo','tcp');

$this=pack($sockaddr,$AF_INET,$port,"\0\0\0\0");

select(STDOUT);
$|=1;

chop($hostname = `hostname`);
($name,$aliases,$type,$len,$thisaddr)=gethostbyname($hostname);
($name,$aliases,$type,$len,$thataddr)=gethostbyname($TO);

$this = pack($sockaddr,$AF_INET,0,$thisaddr);
$that = pack($sockaddr, $AF_INET,$port,$thataddr);

undef $bad;
print "Creating socket to $TO:$port...\n" if ($verbose);
socket(REMOTE,$AF_INET,$SOCK_STREAM,$proto) || ($bad=1,warn "socket: $!");
$bad || bind(REMOTE,$this) || ($bad=1,warn "bind($hostname): $!");
$bad || connect(REMOTE,$that) || ($bad=1,warn "connect($TO): $!");

select(REMOTE);
$|=1;
select(STDOUT);

etc.

Cheers,
   alf

0
 
LVL 1

Author Comment

by:elpmet
ID: 2774261
Hi folks,

Thank you for your replies!

I agree with you to some extent, maneshr, to get a working solution first as you suggested, but I also see this as a good opportunity for me to force myself to learn the nuts & bolts of how perl socket works.

alien_life_form,
Thank you again for re-writing the code for me.  Yet, the script seems to go into an infinite loop in the while statement on my system using perl version 5.004_04, which I cannot control.

I also receive the following error messages on a different machine with perl version 5.001 (which I cannot control, either).

Undefined subroutine &main::inet_aton called at test.cgi line 14.
Undefined subroutine &main::sockaddr_in called at test.cgi line 15

Can we write only with perl-original functions to replace sockaddr_in and inet_aton?  (I really prefer solutions under 5.001)  Also, do we need to use bind(), or the above functions takes care of it instead?

I'm sorry to be bothering you, but I would be greatful if you could put up with me.

I will increase points to 200.

Elpmet

0
 
LVL 3

Expert Comment

by:alien_life_form
ID: 2774275
..and, in the above code, I changed www.yahoo.com to localhost for testing purposes...
0
 
LVL 1

Author Comment

by:elpmet
ID: 2774300
alien_life_form,

Thank you for your prompt response and explanation!  I will run your script this afternoon and come back to you.

I am sorry for my previous double posting.  (It was my right elbow that hit the enter key by accident)

Elpmet
0
 
LVL 1

Author Comment

by:elpmet
ID: 2775041
alien_life_form,

Your script (under 5.005) works perfectly!!  Thank you!  I have already sent an email to the sysadmin to upgrade their old perl.

I will study your script tonight by playing with it a lot, and start modifying my own script.

As for your second script, I am still interested in examining it all over.  Your example is good enough as a head start for me.

Thank you for giving me this excellent opportunity to learn socket in perl.  I also would like to thank you, maneshr, for your suggestions and information!

Elpmet.

0
 
LVL 16

Expert Comment

by:maneshr
ID: 2775066
Wish you much success with your learning PERL. :-)
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video discusses moving either the default database or any database to a new volume.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now