[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

PHP html-File search

Posted on 2004-08-11
6
Medium Priority
?
264 Views
Last Modified: 2010-04-17
Hello,

I need an PHP Script which open all HTML files on the public_html directory ( including subdirectorys ) and Search for an "string" ( non case sensitiv ) and the "title" of the page ( every html page have an title like: <title>Home</title>. It schould count how often this "string" appears on this site and print the result as an Link sorted by the number of hits.

For example my webspace contains 3 HTML pages:
- index.htm
- misc.htm
- kontakt.htm
I search for the word "Images", it exist in the Page "index.htm" once, in the misc.htm 8 of times. The output of the script should be:
Miscellaneous (8)
-> <a href="misc.htm">View Site</a>
Home (1)
-> <a href="misc.htm">View Site</a>

Some Information:
- Webserver Apache 1.3.31
- PHP-version: 4.x
- complete directory for local search: /home5/f6721805/public_html/ ( only files of the "public_html" will be accessable on the internet )

I have never written an php programm ( only other languages like Simatic S7 for Siemens SPS, HTML/CSS, VBS) so i think you peoples are faster in writing it than me. I think I'm able to integrate the script into my HTML Files.

----------------------------------------------------------
Sorry for my bad english, if i have something written incomprehensible please tell me. I will try to explain it again/better.
0
Comment
Question by:Kakashi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 3

Expert Comment

by:thecode101
ID: 11774764
Try this out, if nothing else it is a good start:

<?php
$dir = "";
$search = "";

 if ($handle = opendir($dir)) {
      while (false !== ($file = readdir($handle))) {
      $contents = "";
              if ($file != "." && $file != "..") {
                         $fp = fopen($dir."/".$file, "r");
                         $contents = fread($fp, filesize($dir."/".$file));
                         fclose($fp);
                         $split = explode ("<title>",$contents);
                         $split = explode ("</title>",$split[1]);
                         $title = $split[0];
                         $numOfOccurences = substr_count ($contents,$search);
                         echo $title."(".$numOfOccurences.")"."<a href='".$dir."/".$file."'>View Site</a><br>";
              }
      }
}
closedir($handle);
?>
0
 
LVL 3

Expert Comment

by:Sasho
ID: 11775377
Here is more code to look at for ideas. But please remember that thecode101 answered first and his code should work as well.
<?PHP

function recursive_listdir($base) {
   static $filelist = array();
   static $dirlist = array();

   if(is_dir($base)) {
       $dh = opendir($base);
       while (false !== ($dir = readdir($dh))) {
           if (is_dir($base ."/". $dir) && $dir !== '.' && $dir !== '..') {
               $subbase = $base ."/". $dir;
               $dirlist[] = $subbase;
               $subdirlist = recursive_listdir($subbase);
           } elseif(is_file($base ."/". $dir) && $dir !== '.' && $dir !== '..') {
               $filelist[] = $base ."/". $dir;
           }
       }
       closedir($dh);
   }
   $array['dirs'] = $dirlist;
   $array['files'] = $filelist;
   return $array;
 }


$directory_structure=recursive_listdir(".");
foreach ($directory_structure['files'] as $file){
      $handle = fopen($file, "r");
      $lines = fread($handle, filesize($file));
      fclose($handle);

      $count=preg_match_all("/hello/i",$lines,$matches);
      preg_match("/<title>(.*)<\/title>/i",$lines, $matches);
      $title = $matches[1];

      print("$title($count) <a href=\"$file\">View Site</a><br>");
}

?>
0
 

Author Comment

by:Kakashi
ID: 11776925
@ Sahso

wow your code is good. but i need your help.

- first i get some warnings look at http://www.synapstix.de/suche.php.
- Next thing this script search every file how can i confine the search only to *.htm|*.html files ?

- First thing i have done is that i have change your result output:
  if($count != 0){
       print("$title($count) <a href=\"$file\">View Site</a><br>");
   }
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 3

Expert Comment

by:Sasho
ID: 11777132
Here is the fix to only do htm and html files:
<?PHP

function recursive_listdir($base) {
   static $filelist = array();
   static $dirlist = array();

   if(is_dir($base)) {
       $dh = opendir($base);
       while (false !== ($dir = readdir($dh))) {
           if (is_dir($base ."/". $dir) && $dir !== '.' && $dir !== '..') {
               $subbase = $base ."/". $dir;
               $dirlist[] = $subbase;
               $subdirlist = recursive_listdir($subbase);
           } elseif(is_file($base ."/". $dir) && $dir !== '.' && $dir !== '..') {
               $filelist[] = $base ."/". $dir;
           }
       }
       closedir($dh);
   }
   $array['dirs'] = $dirlist;
   $array['files'] = $filelist;
   return $array;
 }


$directory_structure=recursive_listdir(".");
//print_r($directory_structure['files']);

foreach ($directory_structure['files'] as $file){


      if (preg_match("/(.*)\.htm[l]*$/",$file) != 0 ){
            $handle = fopen($file, "r");
            $lines = fread($handle, filesize($file));
            fclose($handle);

            $count=preg_match_all("/hello/i",$lines,$matches);
            preg_match("/<title>(.*)<\/title>/i",$lines, $matches);
            $title = $matches[1];

            print("$title($count) <a href=\"$file\">View Site</a><br>");
      }
}

?>
0
 
LVL 3

Accepted Solution

by:
Sasho earned 2000 total points
ID: 11777228
Try this version to see if your Warnings go away:
<?PHP

function recursive_listdir($base) {
   static $filelist = array();
   static $dirlist = array();

   if(is_dir($base)) {
       $dh = opendir($base);
       while (false !== ($dir = readdir($dh))) {
           if (is_dir($base ."/". $dir) && $dir !== '.' && $dir !== '..') {
               $subbase = $base ."/". $dir;
               $dirlist[] = $subbase;
               $subdirlist = recursive_listdir($subbase);
           } elseif(is_file($base ."/". $dir) && $dir !== '.' && $dir !== '..') {
               $filelist[] = $base ."/". $dir;
           }
       }
       closedir($dh);
   }
   $array['dirs'] = $dirlist;
   $array['files'] = $filelist;
   return $array;
 }


$directory_structure=recursive_listdir(".");
//print_r($directory_structure['files']);

foreach ($directory_structure['files'] as $file){


      if (preg_match("/(.*)\.htm[l]*$/",$file) != 0 ){
            $count = 0;
            $lines='';
            $handle = fopen($file, "r");
            if (filesize($file)!=0){
                  $lines = fread($handle, filesize($file));
            }
            fclose($handle);

            $count=preg_match_all("/hello/i",$lines,$matches);
            preg_match("/<title>(.*)<\/title>/i",$lines, $matches);
            $title = $matches[1];

            if($count != 0){
                   print("$title($count) <a href=\"$file\">View Site</a><br>");
               }
      }
}

?>
0
 

Author Comment

by:Kakashi
ID: 11777335
Sasho Thanks a lot, you get the points ^^
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
In this post we will learn different types of Android Layout and some basics of an Android App.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Introduction to Processes

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question