Link to home
Start Free TrialLog in
Avatar of softpro2k
softpro2k

asked on

Search File Names in Directories (specific) and List them as per Relevance / Most keyword match.

How to search for file names inside some specified Directories and display/list in such an order where most of the search keywords found in the file name. Here file means .htm, .php only (Excluding folder names)

Here is an example. Suppose I have seven files in three different Directories (Say under A, B and C). And the files are: 1) My name is Don.htm, 2) Sweet Berry.php., 3 ) Unnamed Prisorer.htm, 4) Change File Names.php, 5) Don the Dog.htm, 7)  Name of the Sweet Dog.php

Now someone Search with the Keyword: SWEET DOG.          

The result should appear something like this (With most keyword match first):

7) Name of the Sweet Dog.php,
2) Sweet Berry.php
5) Don the Dog.htm

Restricting the no of result in a page (Paginating the result) would be appreciated.

The short code the better :) for a bigginer like me. I have already tried Bn_Search code from Planet Source Code (Title: Meta Search Magic Pro!) for having some concept but it did not work for me.

Waiting for your expert's comments.

Regards


Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

can there be subdirectories?
I assume the search should be case insensitive?
Avatar of softpro2k
softpro2k

ASKER

Yes,  there can be sub directories and the search is case insensetive.

The file name/keyword searching / matching I can do. but the order in which the result should be displayed is what i need.

Hope to receive your comments. Regards.

please check out the below script. it uses a recursive function, which returns some more data that actually needed here, but I use that sort of functions for other things also (keeping it reusable).

then, the code below the function actually extracts the files from that array, comparing to the keywords list, and setting up the $arr_matches array. the ksort() function will ensure the values are sorted ascending on the match count, and the loop below will return the listing in the reverse order to get the highest matches first.

have fun

<?php
 
 function scan_directory_recursively($directory, $filter=FALSE)
 {
     $directory_tree = array();
 
     // if the path has a slash at the end we remove it here
     if(substr($directory,-1) == '/')
     {
         $directory = substr($directory,0,-1);
     }
  
     // if the path is not valid or is not a directory ...
     if(!file_exists($directory) || !is_dir($directory))
     {
         // ... we return false and exit the function
         return FALSE;
  
     // ... else if the path is readable
     }elseif(is_readable($directory))
     {
         // we open the directory
         $directory_list = opendir($directory);
  
         // and scan through the items inside
         while (FALSE !== ($file = readdir($directory_list)))
         {
             // if the filepointer is not the current directory
             // or the parent directory
             if($file != '.' && $file != '..')
             {
                 // we build the new path to scan
                 $path = $directory.'/'.$file;
  
                 // if the path is readable
                 if(is_readable($path))
                 {
                     // we split the new path by directories
                     $subdirectories = explode('/',$path);
  
                     // if the new path is a directory
                     if(is_dir($path))
                     {
                         // add the directory details to the file list
                         $tmp = scan_directory_recursively($path, $filter);
                         $directory_tree = $directory_tree + $tmp;
                     // if the new path is a file
                     }elseif(is_file($path))
                     {
                         // get the file extension by taking everything after the last dot
                         $extension = end(explode('.',end($subdirectories)));
  
                         // if there is no filter set or the filter is set and matches
                         if($filter === FALSE || $filter == $extension)
                         {
                             // add the file details to the file list
                             $directory_tree[] = array(
                                 'path'      => $path,
                                 'name'      => end($subdirectories),
                                 'extension' => $extension,
                                 'size'      => filesize($path),
                                 'kind'      => 'file');
                         }
                     }
                 }
             }
         }
         // close the directory
         closedir($directory_list); 
  
         // return file list
         return $directory_tree;
  
     // if the path is not readable ...
     }else{
         // ... we return false
        return FALSE;    
    }
 }
 // ------------------------------------------------------------
 
 
  $res = scan_directory_recursively(".");  
  $words = explode(" ", "shadow photo _M _J" );
 
  $arr_matches = array();
  
 
  foreach ( $res as $f )
  {
    print "{$f['path']}:<br>";
 
    $count_matches = 0;
    foreach($words as $keyword)
    {
      if (strpos(strtolower($f['path']), strtolower($keyword)) !== FALSE)
      { $count_matches ++; }
    }
 
    if ($count_matches>0)
    {
      if (!isset($arr_matches[$count_matches])) $arr_matches[$count_matches] = array();
      $arr_matches[$count_matches][] = $f;
    }
  } 
 
  ksort($arr_matches);
 
 
  print "<br><hr><br>";
  foreach ($words as $keyword)
  {
    print "$keyword<br>";
  }
  
  print "<br><hr><br>";
 
  $listing = "";  
  foreach($arr_matches as $count => $results )
  {
    $listing_tmp = "$count matches:<br>";
 
    foreach($results as $file)
    {
      $listing_tmp .= "{$file['path']}<br>";
 
    }
 
    $listing = "$listing_tmp<hr>$listing";
  }
 
  print $listing;
 
  print "<br><hr><br>";
  print_r($arr_matches);
 
  print "<br><hr><br>";
  print_r($res);
 
?>

Open in new window

AngelIII.

I have tested your code. Its working and the result listing is very near as i wanted.

As I am a bigginer, may i request you to modify the code so that :

1) It displays the exact matches first, then most keyword matches (That your code does).
2) It displays only the file name (without folder paths)
3) It DONT display file extension.
4) It search in some specific kind of files (Like .txt, .dat, .php3, etc)
5) Split the results in several pages (Pagination) based on No. of result I want to show.

May be i am asking for a lot. But ..... you see.... :)

Have a merry Kristmas.
2) It displays only the file name (without folder paths)
3) It DONT display file extension.
change:
      $listing_tmp .= "{$file['path']}<br>";
into:
      $listing_tmp .=  substr($file['filename'],0, strlen($file['filename']) - strlen($file['extension'])  )  ."<br>";

and:
$directory_tree[] = array(
                                 'path'      => $path,
                                 'name'      => end($subdirectories),
                                 'extension' => $extension,
                                 'size'      => filesize($path),
                                 'kind'      => 'file');
into:
$directory_tree[] = array(
                                 'path'      => $path,
                                 'filename'      => $file,
                                 'name'      => end($subdirectories),
                                 'extension' => $extension,
                                 'size'      => filesize($path),
                                 'kind'      => 'file');


4) It search in some specific kind of files (Like .txt, .dat, .php3, etc)
only for 1 extention? then, just pass in that as second parameter to scan_directory_recursively function.


Tanks, Everything is covered except first part of point 1 [ part a ], i.e. and point 5.


1)  a) It displays the exact matches first (The code does not do this),  THEN b) most keyword matches (That your code does).

5) Split the results in several pages (Pagination) based on No. of result I want to show.

Hope to receive your suggestion.
1) in how far far do EXACT matches differ from most keyword matches?
can you explain that, please

5) let's do that after the other stuff. not everything at once...
Here is an example of exact match. Suppose the Keywords are: SWEET DOG.          

The result should appear something like this

7) Name of the Sweet Dog.php, // This is exact match Sweet+Dog (Full sentence Match)
2) Sweet Berry.php // One Keyword found
5) Don the Dog.htm // One Keyword found
so, what does not work with my suggestion?
for 7), the matches is 2, for 2 and 5 the matches is 1. so 7) is (should be) displayed before 2) and 5). as you say, that is the case.
so, you might use the wrong example to show what goes wrong?
because so far, I don't see what goes wrong...
Yes, I used wrong examples. Here is three file names and expected order of showing result :

7) Name of the Sweet Dog.php, // This is Full sentence Match (Sweet+Dog). Two Keyword found.
2) My Dog don't like Sweet Berry.php // Here also Two Keyword found
5) Don the Dog.htm // One Keyword found

7 is showing first instead of 2 though both of them contains  two keyword matches. I think for this we need one extra keyword search function without  using explode.

May be like this:
$words = "Sweet Dog" ' to search its occurrence.

after this search
$words = explode(" ", "Sweet Dog" );
change:
 foreach($words as $keyword)
    {
      if (strpos(strtolower($f['path']), strtolower($keyword)) !== FALSE)
      { $count_matches ++; }
    }

into:

 foreach($words as $keyword)
    {
      if (strpos(strtolower($f['path']), strtolower($keyword)) !== FALSE)
      { $count_matches ++; }
    }
  if (strpos(strtolower($f['path']), strtolower($words)) !== FALSE)
  { $count_matches ++; }
 
Thanks, I will let you know after testing.
angelIII,

I have tested the code. But i am getting an error saying something like this "Array to String Conversion Error"

I am a bigginer. So, if possible, post the fresh code after updating / changing required chunks (You can remove the codes that displays unwanted / extra informations that i dont want.

One more thing, Can you please comment out your codes, specially in th second part given below for my understanding and clarification:






  $res = scan_directory_recursively(".");  
  $words = explode(" ", "shadow photo _M _J" );
  $arr_matches = array();
  
 foreach ( $res as $f )  {
    print "{$f['path']}:<br>";
    $count_matches = 0;
    foreach($words as $keyword)
    {if (strpos(strtolower($f['path']), strtolower($keyword)) !== FALSE)
      { $count_matches ++; }  }
 
    if ($count_matches>0)  {
      if (!isset($arr_matches[$count_matches]))             $arr_matches[$count_matches] = array();
      $arr_matches[$count_matches][] = $f;     }  } 
  ksort($arr_matches);
 
  print "<br><hr><br>";
  foreach ($words as $keyword)
  {   print "$keyword<br>";  }
  
  print "<br><hr><br>";
 
  $listing = "";  
  foreach($arr_matches as $count => $results )
  {   $listing_tmp = "$count matches:<br>";
     foreach($results as $file)
    {   $listing_tmp .= "{$file['path']}<br>";  }
 
    $listing = "$listing_tmp<hr>$listing";
  }
  print $listing;
  print "<br><hr><br>";
  print_r($arr_matches);
  print "<br><hr><br>";
  print_r($res);
 ?>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
AngelIII,

Thank you very much for your time in explaining the code. The comments are useful to me.

But the result is not being displayed as desired. I am giving here the example again. The desired order of showing result is like :

7) Name of the Sweet Dog.php, // This is Full sentence Match (Sweet+Dog). Two Keyword found.
2) My Dog don't like Sweet Berry.php // Here also Two Keyword found
5) Don the Dog.htm // One Keyword found

7 is showing first instead of 2 though both of them contains  two keyword matches (AS the first one contains exact keyword match "SWEET+DOG"). Please have a look and test the code.

Regards.
>But the result is not being displayed as desired. I am giving here the example again.
>The desired order of showing result is like :

well, I created the 3 files you mention, and the output for me is:

3 matches:
./check/Name of the Sweet Dog.php
2 matches:
./check/My Dog don't like Sweet Berry.php
1 matches:
./check/Don the Dog.htm

so, that is what you are requesting?
ie, what is wrong?
Hello angelIII,

Sorry for being late. I was out of Kolkata for sometime.

Yes, I have also tested the code, and it worked great.

Thanks for your cooperation.

Best of Luck.