Searching and reading all HTML files

I want to either use a program or write a perl program to put a simple search function on my website.  I want to read all html files in all directories that reside under the main directory.  I want to be able to count the occurences of a user entered string that are in each file.  Does anyone know how to code Perl to:

1.  Loop through all files in all directories and read their contents.
2.  Count the occurances of a given string in a file.

Thanks for your help
LVL 2
mzehnerAsked:
Who is Participating?
 
Kim RyanIT ConsultantCommented:
Try this program. To run it, you specify the string, optionally followed by the starting directory. For example:

perl search.pl "your string" ./data

use File::Find;

my $search_string = shift(@ARGV);
@ARGV = ('.') unless @ARGV;
my $occurencs;

sub count_string
{
    return unless $File::Find::name =~ /\.html$/;
    open(CURRENT_FH,"<$_");
    @lines = <CURRENT_FH>;
   
    foreach $line (@lines)
    {
        # search for string as a seperate word, case insensitve
         1 while ( $line =~ /\b$search_string\b/ig and ++$occurences);
    }
    if ($occurences)
    {
      print("String $search_string occurs $occurences times in file $File::Find::name\n");
    }
    close(CURRENT_FH);
    $occurences = 0;
}

find(\&count_string, @ARGV);

0
 
mzehnerAuthor Commented:
It works on a local Linux computer.  I'll try it this weekend on the webserver at tripod.com.  I'll let you know how it goes.

Thanks for your help.
0
 
mzehnerAuthor Commented:
I have been unable to use the script on Tripod, however I know it works and still hope to use it.  I understand most of the script now.  I'll try to contact Tripod to get the problem solved.

Thanks very much for your help.  It is an excellent and short program.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.