How do I download webpage and parse information using Perl

I need to produce a perl program that will display information from a website. I would like to have 7 columns: Date, Time, Name, Address, DOB, Officer(s), Location


The wesbsite the html information is coming from is: http://www.iowa-city.org/police/arrests.asp?charge=94000

This how I started the Perl program. If anyone could help me out, that would be great




#!/usr/bin/perl -w
 
use strict;
use LWP::Simple;
 
my $stuff = get("http://www.iowa-city.org/police/arrests.asp?charge=94000");
 
my $columnTitle = ("Date, Time, Name, Address, DOB, Officer(s), Location\n");
 
print $columnTitle

Open in new window

MsSchlienAsked:
Who is Participating?
 
kawasConnect With a Mentor Commented:
here is code to get the names (its quick and dirty, but you should get the idea)
use HTML::TokeParser;
use Data::Dumper;
use LWP::Simple;
 
use strict;
 
my $stuff = get("http://www.iowa-city.org/police/arrests.asp?charge=94000");
 
my $p     = HTML::TokeParser->new( \$stuff ) or die "Can't open: $!";
my $start = 0;
my @names = ();
while ( my $token = $p->get_tag("tr") ) {
	if ($start) {
		# first column is the name and addr
		$token = $p->get_tag("td");
		$p->get_tag("b");
 
		push @names, $p->get_trimmed_text()
		  if $token->[1]{nowrap} eq 'nowrap'
			  and $token->[1]{valign} eq 'top'
			  and $token->[1]{style}  eq 'font-size: 8pt;';
	}
	if ( not $start ) {
		$start = 1 if $token->[1]{bgcolor} eq 'navy';
	}
}
print Dumper(\@names);

Open in new window

0
 
kawasCommented:
0
 
marchentConnect With a Mentor Commented:
I guess none will write the whole code for you at EE as I'm. What i can suggest you watch the attached sample code. This portion of code will parse the TITLE from the HTML of your link. I use regular expression. Just learn more about regular expression from http://www.perl.com/doc/manual/html/pod/perlre.html and http://www.cs.tut.fi/~jkorpela/perl/regexp.html and think yourself how to write regex to accomplish your task.
#!/usr/bin/perl -w
 
use strict;
use LWP::Simple;
 
my $stuff = get("http://www.iowa-city.org/police/arrests.asp?charge=94000");
 
## A simple regex that will parse the title from the page
if( $stuff =~ /<title>(.*?)<\/title>/i ){
    print "$1\n";
}

Open in new window

0
 
MsSchlienAuthor Commented:
Thank you, I have looked over the links and they are very helpful. However, could you maybe do an example either through psuedo-code or perl as how to obtain the first name from the table on website.
0
 
MsSchlienAuthor Commented:
Thank you, this helped a lot.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.