Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 441
  • Last Modified:

parse input name-value pairs in html

Can someone show me how to extract ALL the name-value pairs from an html document looping through each input and write the results to a tab delimeted file taking into account the differences of whitespace, ", '
ie:
<input type=hidden name="user" value="JohnGreen">
<input type=hidden name="state" value="NewYork">

<input type=hidden name= user value=JohnGreen>
<input type=hidden name= state value=NewYork>

<input type=hidden name='user' value='JohnGreen'>
<input type=hidden name='state' value='NewYork'>


user=JohnGreen state=NewYork
0
malkie
Asked:
malkie
  • 12
  • 9
1 Solution
 
rj2Commented:
#!/usr/bin/perl
use strict;
use HTML::TokeParser::Simple;
open(OUTFILE,">out.txt") || die("can not open file because $!");

my $p = HTML::TokeParser::Simple->new('valuepairs.html');
while ( my $token = $p->get_token ) {
     if ( $token->is_start_tag('input')) {
          my $attr=$token->return_attr();
          print OUTFILE $attr->{ name },"\t",$attr->{ value },"\n";
     }
}    
close(OUTFILE);
0
 
rj2Commented:
Or you could HTML::Parser

#!/usr/bin/perl
use strict;
use HTML::Parser();
open(OUTFILE,">out2.txt") || die("can not open file because $!");

sub start_handler {
     if($_[0] eq 'input') {
          print OUTFILE $_[1]{name},"\t",$_[1]{value},"\n";
     }
}    

my $p = HTML::Parser->new(api_version => 3);
$p->handler( start => \&start_handler, 'tag, attr' );
$p->parse_file('valuepairs.html') || die $!;
close(OUTFILE);
0
 
malkieAuthor Commented:
I tried.............(having installed)
use HTML::TokeParser::Simple;
and got...........

CGI Error
The specified CGI application misbehaved by not returning a complete set of HTTP headers. The headers it did return are:


Can't call method "get_token" on an undefined value at C:\Inetpub\wwwroot\cgi-bin\extracthtml\thecode.cgi line 7.

can you please help...(not very experienced)
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
rj2Commented:
Have you installed the modules HTML::TokeParser::Simple, HTML::Parser and HTML::TokeParser?

Does the file valuepairs.html given as parameter to new exist?
0
 
rj2Commented:
I tested this now, and you get this error if the file gived as parameter to new does not exist.
Try to give a full path to the file as shown below
my $p = HTML::TokeParser::Simple->new('/usr/home/malkie/valuepairs.html');
If the html you want to parse is located on another webserver you must download it first

use LWP::Simple;
getstore('http://www.mysite.com','/usr/home/malkie/valuepairs.html');
0
 
malkieAuthor Commented:
#I have all the modules installed as far as I know. I have looked at the files and the HTML::Parser is version 3.25.
#I have tried numerous ways, I keep getting the errors:-
# I am using a microsoft personal web server.
#I tried with the valuepairs.html in the same folder as the script or a full path.
#Can't locate auto/HTML/Parser//Inetpub/wwwroot/valuepairs.html.al in @INC (@INC contains: C:/Perl/lib C:/Perl/site/lib .) at C:/Perl/site/lib/HTML/Parser.pm line 82
#Even if I type in a NON EXISTANT FILE it comes up with:-
#Can't locate auto/HTML/Parser//Inetpub/wwwroot/nofile.html.al in @INC (@INC contains: C:/Perl/lib C:/Perl/site/lib .) at C:/Perl/site/lib/HTML/Parser.pm line 82
#Can't locate auto/HTML/Parser/valuepairs.html.al in @INC (@INC contains: C:/Perl/lib C:/Perl/site/lib .) at C:/Perl/site/lib/HTML/Parser.pm line 82
#I notice that there is a double forward slash in the error path " Parser//Inetpub " but not if I put in just the file valuepairs.htm
# I don't know why it is adding the .al on the end of the file

#!/usr/bin/perl
use strict;
use HTML::Parser();
open(OUTFILE,">out2.txt") || die("can not open file because $!");

sub start_handler {
    if($_[0] eq 'input') {
         print OUTFILE $_[1]{name},"\t",$_[1]{value},"\n";
    }
}    

#my $p = HTML::Parser->new('/Inetpub/wwwroot/nofile.html');
#my $p = HTML::Parser->new('/Inetpub/wwwroot/valuepairs.html');
my $p = HTML::Parser->new('valuepairs.html');

$p->handler( start => \&start_handler, 'tag, attr' );
$p->parse_file('valuepairs.html') || die $!;
close(OUTFILE);
0
 
rj2Commented:
On Windows a full path starts with a driveletter.
Try something like shown below, change path to correct path on your system

my $p = HTML::TokeParser::Simple->new('c:/inetpub/wwwroot/valuepairs.html');
0
 
malkieAuthor Commented:
I have tried that it came up with:-

Can't locate auto/HTML/Parser/c:/inetpub/wwwroot/valuepairs.html.al in @INC (@INC contains: C:/Perl/lib C:/Perl/site/lib .) at C:/Perl/site/lib/HTML/Parser.pm line 82

I also tried the script on a unix server (using the appropriate paths) it came up with an error perhaps you can let me see the source code you tested.
0
 
rj2Commented:
Weird.
I posted the source I tested in my first comment, it works for me (Apache on WinXP).
Did you get the same error message on the Unix server?
Could it be some problem with your Perl installation?
Try to uninstall and reinstall Perl.
http://www.activestate.com/activeperl
0
 
rj2Commented:
Try to run it from the shell instead as CGI script, ie type "perl parse.pl" on command line (save first script as parse.pl)
Do you get any errors if try try that?
0
 
malkieAuthor Commented:
Just to let you know I have been checking to see if HTML::Parser is installed properly and did as follows.

I ran the PPM> verify --force which upgraded about four packages so I assume everthing is okay.

I ran the script,
*********
use strict;
use HTML::Parser();
open(OUTFILE,">out2.txt") || die("can not open file because $!");

sub start_handler {
    if($_[0] eq 'input') {
         print OUTFILE $_[1]{name},"\t",$_[1]{value},"\n";
    }
}    

my $p = HTML::Parser->new('c:/Inetpub/wwwroot/valuepairs.htm');
$p->handler( start => \&start_handler, 'tag, attr' );
$p->parse_file('valuepairs.html') || die $!;
close(OUTFILE);
*****************
WHICH CAME UP WITH:-

CGI Error
The specified CGI application misbehaved by not returning a complete set of HTTP headers. The headers it did return are:
Can't locate auto/HTML/Parser/c:/Inetpub/wwwroot/valuepairs.htm.al in @INC (@INC contains: C:/Perl/lib C:/Perl/site/lib .) at C:/Perl/site/lib/HTML/Parser.pm line 82

I did the following,

C:\WINDOWS>ppm
PPM interactive shell (2.1.5) - type 'help' for available commands.
PPM>
PPM> verify HTML::Parser
PPM>
PPM> verify HTML-Parser
PPM> install HTML::Parser
Version 3.25 of 'HTML-Parser' is already installed.
Remove it, or use 'verify --upgrade HTML-Parser'.
PPM> verify --upgrade HTML-Parser
Upgrade package 'HTML-Parser'? (y/N): y
PPM>
PPM> verify HTML-Parser
PPM>
PPM> verify DBI
Package 'DBI' is up to date.
PPM> remove HTML::Parser
Remove package 'HTML-Parser?' (y/N): y
Error removing HTML-Parser: Package 'HTML-Parser' is required by PPM and
cannot be removed
PPM>
PPM> verify --force HTML::Parser
PPM>
PPM> verify HTML::Parser
PPM>
PPM> verify HTML-Parser
PPM>

The only thing that I can see is that when I run verify HTML::Parser it doesn't come back with
Package 'HTML-Parser' is up to date.
it always goes back to the PPM> prompt.

I will try what you suggested and get back to you.
" Try to run it from the shell instead as CGI script, ie type "perl parse.pl" on command line (save first script as parse.pl)
Do you get any errors if try try that? "

Thank You

0
 
rj2Commented:
Add the two lines below right after the line "use HTML::Parser" to make it work as CGI script, you did not say anything about CGI in your original question.

use CGI qw/:standard/;
print header;
0
 
malkieAuthor Commented:
Thank you.
I put the CGI qw/:standard/; print header; as you said and it has stopped the error.

I ran the script (cgi) and it is definately doing something but it doesn't print anything to out2.txt file. I have put some html code below to show you what it is reading which is the source code for valuepairs.html in the Inetpub/wwwroot/ folder.

#!/usr/bin/perl
use strict;
use HTML::Parser();
use CGI qw/:standard/;
print header;

open(OUTFILE,">out2.txt") || die("can not open file because $!");

sub start_handler {
    if($_[0] eq 'input') {
         print OUTFILE $_[1]{name},"\t",$_[1]{value},"\n";
    }
}    

my $p = HTML::Parser->new('/Inetpub/wwwroot/valuepairs.html');
$p->handler( start => \&start_handler, 'tag, attr' );
$p->parse_file('valuepairs.html') || die $!;
close(OUTFILE);
*********
<html><head><title>test</title>
</head>

<center><h1>test</h1></center>
<form method=POST action="http://yes/cgi-bin/ascript.pl">
<p>
<input type=hidden name="userdir" value="lnks">
<input type=hidden name="lnkuser" value="okay">
EMail: <input type=text name="emailid" size=30>&nbsp;<br>
Title: <input type=text name="title" size=40><br>
URL: <input type=text name="url" size=55><br>
Section to be placed in: <select name="section">
<option> Business <option> Computers <option> Education  <option> Entertainment
<option> Government <option> Personal  <option selected> Miscellaneous </select><br>
<input type=submit value="Add"> * <input type=reset></p>
</form>

0
 
rj2Commented:
Replace the line
my $p = HTML::Parser->new('/Inetpub/wwwroot/valuepairs.html');
with
my $p = HTML::Parser->new(api_version => 3);
as posted in ny second comment.

Replace
open(OUTFILE,">out2.txt") || die("can not open file because $!");
with
open(OUTFILE,">e:/inetpub/wwwroot/out2.txt") || die("can not open file because $!");
(replace path with correct path on your system)
0
 
malkieAuthor Commented:
my $p = HTML::Parser->new(api_version => 3);
as posted in ny second comment.
# I tried this and it doesn't work and having looked at other example code I think this needs to be a file.

open(OUTFILE,">e:/inetpub/wwwroot/out2.txt") || die("can not open file because $!");
# Yes the full path (it works okay without the e: I have used this on other cgi scripts)writes a file but it doesn't write anything to it???

#!/usr/bin/perl
use strict;
use HTML::Parser();
use CGI qw/:standard/;
print header;

# Absolute path to out2.txt file:
my $outfile = "/Inetpub/wwwroot/cgi-bin/extracthtml/out2.txt";

# Absolute path to valuepairs.html file:
my $infile = "/Inetpub/wwwroot/valuepairs.html";

THE CODE I AM USING.....
#!/usr/bin/perl
use strict;
use HTML::Parser();
use CGI qw/:standard/;
print header;

# Absolute path to out2.txt file:
my $outfile = "/Inetpub/wwwroot/cgi-bin/extracthtml/out2.txt";

# Absolute path to valuepairs.html file:
my $infile = "/Inetpub/wwwroot/valuepairs.html";

open(OUTFILE,">$outfile") || die("can not open file because $!");
sub start_handler {
    if($_[0] eq 'input') {
         print OUTFILE $_[1]{name},"\t",$_[1]{value},"\n";
    }
}    

my $p = HTML::Parser->new('$infile');
$p->handler( start => \&start_handler, 'tag, attr' );
$p->parse_file('$infile') || die $!;
close(OUTFILE);
0
 
rj2Commented:
Absolute paths start with a drive letter in Windows.
0
 
malkieAuthor Commented:
okay but it still isn't working. The script is writing a out2.txt file but it isn't printing anything to it.
0
 
rj2Commented:
Well, it works on my system.
Is it the same thing if you run it from the command line instead of as a CGI script?
0
 
rj2Commented:
I tested your script now, and I changed a couple of things. Try script below now.

#!e:/perl/bin/perl.exe
use strict;
use HTML::Parser();
use CGI qw/:standard/;
print header;

# Absolute path to out2.txt file:
my $outfile = "e:/temp/out2.txt";

# Absolute path to valuepairs.html file:
my $infile = "e:/temp/valuepairs.html";

open(OUTFILE,">$outfile") || die("can not open file because $!");
sub start_handler {
   if(lc($_[0]) eq 'input') {
        print OUTFILE $_[1]{name},"\t",$_[1]{value},"\n";
   }
}    

my $p = HTML::Parser->new(api_version => 3);
$p->handler( start => \&start_handler, 'tag, attr' );
$p->parse_file($infile) || die $!;
close(OUTFILE);
0
 
malkieAuthor Commented:
Try to run it from the shell instead as CGI script, ie type "perl parse.pl" on command line (save first script as parse.pl)
Do you get any errors if try try that?
**************
Goods news it works with parse.pl (almost) this is what I got:-

userdir lnks
lnkuser okay
emailid
title
url
    Add

**************
from parse.pl

#!/usr/bin/perl
use strict;
use HTML::TokeParser::Simple;
open(OUTFILE,">out.txt") || die("can not open file because $!");

my $p = HTML::TokeParser::Simple->new('valuepairs.html');
while ( my $token = $p->get_token ) {
if ( $token->is_start_tag('input')) {
my $attr=$token->return_attr();
print OUTFILE $attr->{ name },"\t",$attr->{ value },"\n";
}
}
close(OUTFILE);
***************
I am not sure why it did not pick up the <input type=reset> maybe it is
because it is on the same line as <input type=submit value="Add">.

However it was my hope to have it run from the browser.

I will now try your last comment posted. THANKS

0
 
malkieAuthor Commented:
ok that now works from the browser but what I wanted to do was read a url ie http://www.mysite.com and not a file as we have been doing with c:/inetpub/wwwrooot/valuepairs.html
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 12
  • 9
Tackle projects and never again get stuck behind a technical roadblock.
Join Now