?
Solved

How do I download webpage and parse information using Perl

Posted on 2008-11-09
5
Medium Priority
?
688 Views
Last Modified: 2013-11-13
I need to produce a perl program that will display information from a website. I would like to have 7 columns: Date, Time, Name, Address, DOB, Officer(s), Location


The wesbsite the html information is coming from is: http://www.iowa-city.org/police/arrests.asp?charge=94000

This how I started the Perl program. If anyone could help me out, that would be great




#!/usr/bin/perl -w
 
use strict;
use LWP::Simple;
 
my $stuff = get("http://www.iowa-city.org/police/arrests.asp?charge=94000");
 
my $columnTitle = ("Date, Time, Name, Address, DOB, Officer(s), Location\n");
 
print $columnTitle

Open in new window

0
Comment
Question by:MsSchlien
  • 2
  • 2
5 Comments
 
LVL 13

Expert Comment

by:kawas
ID: 22922391
0
 
LVL 13

Assisted Solution

by:marchent
marchent earned 400 total points
ID: 22922426
I guess none will write the whole code for you at EE as I'm. What i can suggest you watch the attached sample code. This portion of code will parse the TITLE from the HTML of your link. I use regular expression. Just learn more about regular expression from http://www.perl.com/doc/manual/html/pod/perlre.html and http://www.cs.tut.fi/~jkorpela/perl/regexp.html and think yourself how to write regex to accomplish your task.
#!/usr/bin/perl -w
 
use strict;
use LWP::Simple;
 
my $stuff = get("http://www.iowa-city.org/police/arrests.asp?charge=94000");
 
## A simple regex that will parse the title from the page
if( $stuff =~ /<title>(.*?)<\/title>/i ){
    print "$1\n";
}

Open in new window

0
 

Author Comment

by:MsSchlien
ID: 22924425
Thank you, I have looked over the links and they are very helpful. However, could you maybe do an example either through psuedo-code or perl as how to obtain the first name from the table on website.
0
 
LVL 13

Accepted Solution

by:
kawas earned 1600 total points
ID: 22924672
here is code to get the names (its quick and dirty, but you should get the idea)
use HTML::TokeParser;
use Data::Dumper;
use LWP::Simple;
 
use strict;
 
my $stuff = get("http://www.iowa-city.org/police/arrests.asp?charge=94000");
 
my $p     = HTML::TokeParser->new( \$stuff ) or die "Can't open: $!";
my $start = 0;
my @names = ();
while ( my $token = $p->get_tag("tr") ) {
	if ($start) {
		# first column is the name and addr
		$token = $p->get_tag("td");
		$p->get_tag("b");
 
		push @names, $p->get_trimmed_text()
		  if $token->[1]{nowrap} eq 'nowrap'
			  and $token->[1]{valign} eq 'top'
			  and $token->[1]{style}  eq 'font-size: 8pt;';
	}
	if ( not $start ) {
		$start = 1 if $token->[1]{bgcolor} eq 'navy';
	}
}
print Dumper(\@names);

Open in new window

0
 

Author Closing Comment

by:MsSchlien
ID: 31514895
Thank you, this helped a lot.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
Six Sigma Control Plans
Loops Section Overview

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question