Solved

Searching and reading all HTML files

Posted on 2001-06-07
3
194 Views
Last Modified: 2010-03-05
I want to either use a program or write a perl program to put a simple search function on my website.  I want to read all html files in all directories that reside under the main directory.  I want to be able to count the occurences of a user entered string that are in each file.  Does anyone know how to code Perl to:

1.  Loop through all files in all directories and read their contents.
2.  Count the occurances of a given string in a file.

Thanks for your help
0
Comment
Question by:mzehner
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 19

Accepted Solution

by:
Kim Ryan earned 100 total points
ID: 6165959
Try this program. To run it, you specify the string, optionally followed by the starting directory. For example:

perl search.pl "your string" ./data

use File::Find;

my $search_string = shift(@ARGV);
@ARGV = ('.') unless @ARGV;
my $occurencs;

sub count_string
{
    return unless $File::Find::name =~ /\.html$/;
    open(CURRENT_FH,"<$_");
    @lines = <CURRENT_FH>;
   
    foreach $line (@lines)
    {
        # search for string as a seperate word, case insensitve
         1 while ( $line =~ /\b$search_string\b/ig and ++$occurences);
    }
    if ($occurences)
    {
      print("String $search_string occurs $occurences times in file $File::Find::name\n");
    }
    close(CURRENT_FH);
    $occurences = 0;
}

find(\&count_string, @ARGV);

0
 
LVL 2

Author Comment

by:mzehner
ID: 6168345
It works on a local Linux computer.  I'll try it this weekend on the webserver at tripod.com.  I'll let you know how it goes.

Thanks for your help.
0
 
LVL 2

Author Comment

by:mzehner
ID: 6193640
I have been unable to use the script on Tripod, however I know it works and still hope to use it.  I understand most of the script now.  I'll try to contact Tripod to get the problem solved.

Thanks very much for your help.  It is an excellent and short program.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question