rip table again, refer to roylam's question

i found it very useful to print certain tables instead of printing the whole lot.  here is what i want to do:

myperlprogram 1 - print all tables without nested table in them.
i.e. <table></table>

myperlprogram 2 - print all tables with exactly one table in them.
i.e. <table><table></table></table>

here "1" and "2" are the arguments passed to the program.

i spent ages trying to figure out the algorithm but no luck.  any idea?
crestAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

khacharnCommented:
Could you be more specific..
i m not sure what you want to do..
are you not able to pass the parameter or what..?
0
crestAuthor Commented:
i've use the source code from  a PAQ "rip tables apart in html" - Computers/Programming/Languages/Perl/Q_10254674.html.  
In which it print all the tables in a recursive manner.  I don't want that, I want to print just certain tables that have a specific number of nested table(s) in them as I stated in my question.  But my problem is that I couldn't get it working - use wrong algorithm.  Anyway here is the original source code:



package myParser;
use HTML::Parser;
@ISA = qw(HTML::Parser);

sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = @_;
if ($tag eq "table") {
   push @tables, $origtext;
}
elsif ($#tables >= 0) {
   $tables[$#tables] .= $origtext;
}
}
sub text {
my ($self, $text) = @_;
$tables[$#tables] .= $text unless ($#tables < 0);

}
sub end {
my ($self, $tag, $origtext) = @_;
if ($tag eq "table") {
   if (!@tables) {
print "Parse Error: Mismatched table tags-- too many closing tags\n";
   }
   else {
my $currenttable = pop @tables;
      $currenttable .= $origtext;
$totaltablecount++;
      open (OUTPUT, ">${filebase}_${totaltablecount}.txt") || die "Can't open
${filebase}_${totaltablecount}.txt: $!\n";
print OUTPUT "$currenttable\n";
      close OUTPUT;
print "$currenttable\n\n";
      $tables[$#tables] .= $currenttable unless ($#tables < 0)
   }
}
elsif ($#tables >= 0) {
   $tables[$#tables] .= $origtext;
}
}

sub startParse {
(my $self, my $page, $filebase) = @_;
$totaltablecount = 0;
@tables = ();
$self->parse($page);
}
# END OF MYPARSER SUBCLASS


package main;

$page = qq(

<html>

<table>
1 belongs here
<table>
  here is 2
</table>
<table>
  here is 3
</table>
</table>

</html>
);

# use LWP::Simple;
# $page = LWP::Simple::get('http://www.experts-exchange.com');

$p = new myParser;
$p->startParse($page, "MyOutputFileName");

Comment

From: roylam
                  Date: Wednesday, January 05 2000 - 01:30PM PST

i tried to install HTML:Parser but unfortunately i don't have the permission to
do so.  i remember there was a way to tell the perl program where to look for
the module (which is compiled in my home directory, i'm using sun
solaris).  can anyone tell me how to do that? thanks.


Comment

From: clockwatcher
                  Date: Wednesday, January 05 2000 - 04:07PM PST

I'm not a unix user, so there may be a lot better way to do this, but the
following works for me.

After untarring,

perl Makefile.pl PREFIX=/myhomedir/perl
make
make test
make install

that should install the module within /myhomedir/perl.

On my system, the above does the following:

Places documentation in:

~/perl/lib/perl5/man

Places the modules in:

~/perl/lib/site-perl/5.005/i386-linux

Your exact paths will be different.  Then you simply add the path to @INC
and change the "use" to "require".

push (@INC, '/home/mark/perl/lib/site-perl/5.005/i386-linux');

package myParser;
require HTML::Parser;
@ISA = qw(HTML::Parser);

sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = @_;
if ($tag eq "table") {
   push @tables, $origtext;
}
elsif ($#tables >= 0) {
   $tables[$#tables] .= $origtext;
}
}
sub text {
my ($self, $text) = @_;
$tables[$#tables] .= $text unless ($#tables < 0);

}
sub end {
my ($self, $tag, $origtext) = @_;
if ($tag eq "table") {
   if (!@tables) {
print "Parse Error: Mismatched table tags-- too many closing tags\n";
   }
   else {
my $currenttable = pop @tables;
      $currenttable .= $origtext;
$totaltablecount++;
      open (OUTPUT, ">${filebase}_${totaltablecount}.txt") || die "Can't open
${filebase}_${totaltablecount}.txt: $!\n";
print OUTPUT "$currenttable\n";
      close OUTPUT;
print "$currenttable\n\n";
      $tables[$#tables] .= $currenttable unless ($#tables < 0)
   }
}
elsif ($#tables >= 0) {
   $tables[$#tables] .= $origtext;
}
}

sub startParse {
(my $self, my $page, $filebase) = @_;
$totaltablecount = 0;
@tables = ();
$self->parse($page);
}
# END OF MYPARSER SUBCLASS


package main;

$page = qq(

<html>

<table>
1 belongs here
<table>
  here is 2
</table>
<table>
  here is 3
</table>
</table>

</html>
);

# use LWP::Simple;
# $page = LWP::Simple::get('http://www.experts-exchange.com');

$p = new myParser;
$p->startParse($page, "MyOutputFileName");


0
crestAuthor Commented:
oops, please ignore the bit after "COMMENT" , my fault -- copy-and-paste problem.
0
Starting with Angular 5

Learn the essential features and functions of the popular JavaScript framework for building mobile, desktop and web applications.

clockwatcherCommented:
I'm not 100% sure what you want, but I believe this is what you're asking for:


# SUBCLASS HTML::PARSER to pull the links and info
package myParser;
use HTML::Parser;
@ISA = qw(HTML::Parser);

sub start {
  my ($self, $tag, $attr, $attrseq, $origtext) = @_;
  if ($tag eq "table") {
     $withincount = -1 unless (@tables);
     push @tables, $origtext;
     $withincount++;
  }
  elsif ($#tables >= 0) {
     $tables[$#tables] .= $origtext;
  }
}
sub text {
  my ($self, $text) = @_;
  $tables[$#tables] .= $text unless ($#tables < 0);

}
sub end {
  my ($self, $tag, $origtext) = @_;
  if ($tag eq "table") {
     if (!@tables) {
      print "Parse Error: Mismatched table tags-- too many closing tags\n";
     }
     else {
      my $currenttable = pop @tables;
        $currenttable .= $origtext;
        if (!@tables && ($withincount <= $maxnest)) {
           $totaltablecount++;
           open (OUTPUT, ">${filebase}_${totaltablecount}.txt") || die "Can't open ${filebase}_${totaltablecount}.txt: $!\n";
         print OUTPUT "$currenttable\n";
           close OUTPUT;
         print "$currenttable\n\n";
        }
        $tables[$#tables] .= $currenttable unless ($#tables < 0)
     }
  }
  elsif ($#tables >= 0) {
     $tables[$#tables] .= $origtext;
  }
}

sub startParse {
  (my $self, my $page, $filebase, $maxnest) = @_;
  $totaltablecount = 0;
  @tables = ();
  $self->parse($page);
}
# END OF MYPARSER SUBCLASS


package main;

$maxinnertables = (shift) - 1;
$maxinnertables = 0 unless ($maxinnertables > 0);

$page = qq(

<html>

<table>
   No nested
</table>

<table>
   One nested
  <table>
    nest
  </table>
</table>

<table>
   Two nested
   <table>
      nest 1
   </table>
   <table>
      nest 2
   </table>
</table>

<table>
   One nest with a nest
   <table>
      nest containing a nest
        <table>
           nest within a nest
        </table>
   </table>
</table>

</html>
);

# use LWP::Simple;
# $page = LWP::Simple::get('http://www.experts-exchange.com');

$p = new myParser;
$p->startParse($page, "MyOutputFileName", $maxinnertables);
0
crestAuthor Commented:
thanks for your quick response, clockwatcher.
but the output isn't quite what i want.  let me use your example to illustrate:
ie (
<table>
   No nested
</table>

<table>
   One nested
  <table>
    nest
  </table>
</table>

<table>
   Two nested
   <table>
      nest 1
   </table>
   <table>
      nest 2
   </table>
</table>

<table>
   One nest with a nest
   <table>
      nest containing a nest
        <table>
           nest within a nest
        </table>
   </table>
</table>
)


here is what i expect the program to output with the following commandline:

myperlprogram 0:

<table>
   No nested
</table>

  <table>
    nest
  </table>

   <table>
      nest 1
   </table>

   <table>
      nest 2
   </table>

        <table>
           nest within a nest
        </table>


myperlprogram 1:
<table>
   One nested
  <table>
    nest
  </table>
</table>

<table>
   Two nested
   <table>
      nest 1
   </table>
   <table>
      nest 2
   </table>
</table>

   <table>
      nest containing a nest
        <table>
           nest within a nest
        </table>
   </table>




and myperlprogram 2 will give this ouput:

<table>
   One nest with a nest
   <table>
      nest containing a nest
        <table>
           nest within a nest
        </table>
   </table>
</table>

i'll try to clarify a bit more if this is not clear.
0
crestAuthor Commented:
actually, i think how many "nested" table is not quite right.  it should be how many "levels" of table.
0
ozoCommented:
 
0
clockwatcherCommented:
You changed your requirements a bit, but I'm pretty sure I understand what it is you're looking for now.

BTW, I didn't look into it because I was more interested in doing it myself, but you may want to check out HTML::TreeBuilder.

Anyway, here you go:

# SUBCLASS HTML::PARSER to pull the tables
package myParser;
use HTML::Parser;
@ISA = qw(HTML::Parser);

sub start {
  my ($self, $tag, $attr, $attrseq, $origtext) = @_;
  if ($tag eq "table") {
     $currentdepth++;
     $newtable = { "text" => $origtext,
                   "depth" => $currentdepth,
                   "childlevels" => 0,
                   "childtables" => () };
     if (!$current_table) {
         $current_table = $newtable;
     }
     else {
         push @{$current_table->{"childtables"}}, $newtable;
         $parent_table = $current_table;
         $current_table = $newtable;
         $current_table->{"parent"} = $parent_table;
         $myparent = $current_table->{"parent"};
         while ($myparent) {
            $depthlevel = $currentdepth - $myparent->{depth};
            $myparent->{childlevels} = $depthlevel if ($depthlevel > $myparent->{"childlevels"});
            $myparent = $myparent->{"parent"};
         }
     }
  }
  elsif ($current_table) {
     $current_table->{"text"} .= $origtext;
  }
}
sub text {
  my ($self, $text) = @_;
  $current_table->{"text"} .= $text if ($current_table);

}
sub end {
  my ($self, $tag, $origtext) = @_;
  if ($tag eq "table") {
     $currentdepth--;
     if (!$current_table) {
        print "Parse Error: Mismatched table tags-- too many closing tags\n";
     }
     else {
        $current_table->{"text"} .= $origtext;
        $current_table_text = $current_table->{"text"};
       
        if ($current_table->{"childlevels"} == $numchildlevels) {
            print "$current_table_text\n\n" ;
            $totaltablecount++;
            open (OUTPUT, ">${filebase}_${totaltablecount}.txt") || die "Can't open ${filebase}_${totaltablecount}.txt: $!\n";
            print OUTPUT "$current_table_text\n\n";
            close OUTPUT;
        }
       
        if ($current_table->{"parent"}) {
            $current_table = $current_table->{"parent"};
            $current_table->{"text"} .= $current_table_text;
        }
        else {
            undef $current_table;
        }
    }
  }
  elsif ($current_table) {
     $current_table->{"text"} .= $origtext;
  }
}

sub startParse {
  (my $self, my $page, $filebase, $numchildlevels) = @_;
  $self->parse($page);
}
# END OF MYPARSER SUBCLASS


package main;

$numchildtables = (shift);
$numchildtables = 0 unless ($numchildtables > 0);

$page = qq(

<html>

<table>
   No nested
</table>

<table>
   One nested
  <table>
    nest
  </table>
</table>

<table>
   Two nested
   <table>
      nest 1
   </table>
   <table>
      nest 2
   </table>
</table>

<table>
   One nest with a nest
   <table>
      nest containing a nest
        <table>
           nest within a nest
        </table>
   </table>
</table>

</html>
);

# use LWP::Simple;
# $page = LWP::Simple::get('http://www.experts-exchange.com');

$p = new myParser;
$p->startParse($page, "MyOutputFileName", $numchildtables);
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
crestAuthor Commented:
clockwatcher, you're champion!
thanks heaps for your quick response.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.