Link to home
Start Free TrialLog in
Avatar of Josh Barton
Josh BartonFlag for United States of America

asked on

stripping tags

i have a search engine for searching my site but it doesnt strip tags please see what you can do
<?php

// You can change the colours of the search results using the variables below:

$page_title = '' ;      // Enter your own page title here.
$this_file = 'searcher.php' ;    // name of the search file to exclude from search
$table_background = '#CCCCCC' ;    // background colour of table (html accepted words or hex)
$highlight_colour = '#FFFFFF' ;      // background colour to highlight matched words
$highlight_text = '#003399' ;        // text colour to highlight words (should contrast with $highlight_colour

// DO NOT CHANGE ANYTHING BELOW THIS LINE UNLESS YOU KNOW WHAT YOU ARE DOING!

$all_lines = array() ;
$all_files = array() ;
$count = 0 ;
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
    "'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
    "'([\r\n])[\s]+'", // Strip out white space
    "'&(quot|#34);'i", // Replace html entities
    "'&(amp|#38);'i",
    "'&(lt|#60);'i",
    "'&(gt|#62);'i",
    "'&(nbsp|#160);'i",
    "'&(iexcl|#161);'i",
    "'&(cent|#162);'i",
    "'&(pound|#163);'i",
    "'&(copy|#169);'i",
    "'&#(\d+);'e") ; // evaluate as php

$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)") ;

echo "<h1>$page_title</h1>" ;

function searchFiles ($dir,$search_string) {
  global $count, $search, $replace, $this_file,$highlight_colour,$highlight_text ;  
  chdir($dir) ;                              // change to dir passed as arg (else filetype doesn't work)
  $dir_handle = opendir('.') ;                    // open directory
  while (($file_name = readdir($dir_handle)) !== false) {  // cycle through each file in starting directory
    $file_type = filetype($file_name) ;              // test for files of type 'file'
    if ($file_type == 'file' && $file_name != "$this_file" && preg_match ("/\.(php|htm|html|txt|php3)/i", $file_name)) {  // exclude this file from search
      $file_contents = file ($file_name) ;
      $inside_php = 0 ;
      $no_search = 0 ;
      foreach ($file_contents as $line) {
        $matches_array = array() ;
        $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
        if (preg_match ("/\<\?php\b/i", $line)) { $inside_php = 1 ; }    // find start of php
        if (preg_match ("/NOSEARCH/i", $line)) { $no_search = 1 ; }    // find start of no search
        if (preg_match ("/\b($search_string)/i",$stripped_line,$matches_array) && $inside_php == 0 && $no_search == 0) {  // search if not in php
          $count++ ;
          $matched_string = $matches_array[1] ;
          $stripped_line = preg_replace ("/\b$search_string/i","<b style=\"color:$highlight_text; background-color: $highlight_colour\">$matched_string</b>",$stripped_line) ;    // highlight matched string
          createResultsArray($dir,$file_name,$stripped_line) ;
        }
        if (preg_match ("/\?\>(\n|\r|)/i", $line)) { $inside_php = 0 ; }    // find end of php
        if (preg_match ("/ENDNOSEARCH/i", $line)) { $no_search = 0 ; }    // find end of no search
      }
    }
  }
}

function createResultsArray ($dir,$file_name,$stripped_line) {

  global $all_lines, $all_files, $max_chars ;
 
  array_push($all_lines,$stripped_line) ;            // add line to array
  array_push($all_files,$file_name) ;              // create relevant file link and add to seperate array
}

if ($search_string == '') {
  echo '<h2></h2><p style="text-align: center"></p>
' ;
} else {
  searchFiles('.',$search_string) ;

  echo "<h6>Results for: <u>$search_string</u>.<br />$count results were found.</h6>" ;
  echo '<table>' ;

  $count = 0 ;
  foreach ($all_lines as $result_line) {
    $count = $count + 1 ;
    $array_ref = $count - 1 ;
    echo "<tr><td style=\"padding: 5px; background-color: $table_background\"><p><b>$count.</b> $result_line <a href=\"$all_files[$array_ref]\">More...</a></p></td></tr>" ;
  }

  echo '</table>' ;  

}

?>
Avatar of VGR
VGR

I hate preg(), eregi(), eregreplace() I don't find them reliable

I would use strpos(), substr() and the like

or strip_tags() ;-))
Avatar of Josh Barton

ASKER

hey VGR please show how you you use your suggestion in my script because i dont the usage or where to use the tags you suggested.. Thanks
Examples are here :
http://www.php.net/manual/en/function.strip-tags.php
http://www.php.net/manual/en/function.strpos.php
http://www.php.net/manual/en/function.substr.php

http://www.php.net/manual/en/function.html-entity-decode.php


RTFM

your code parts :
(pseudo-code, not tested)

<?php

// You can change the colours of the search results using the variables below:

$page_title = '' ;      // Enter your own page title here.
$this_file = 'searcher.php' ;    // name of the search file to exclude from search
$table_background = '#CCCCCC' ;    // background colour of table (html accepted words or hex)
$highlight_colour = '#FFFFFF' ;      // background colour to highlight matched words
$highlight_text = '#003399' ;        // text colour to highlight words (should contrast with $highlight_colour

// DO NOT CHANGE ANYTHING BELOW THIS LINE UNLESS YOU KNOW WHAT YOU ARE DOING!

$all_lines = array() ;
$all_files = array() ;
$count = 0 ;
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
   "'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
   "'([\r\n])[\s]+'", // Strip out white space
   "'&(quot|#34);'i", // Replace html entities
   "'&(amp|#38);'i",
   "'&(lt|#60);'i",
   "'&(gt|#62);'i",
   "'&(nbsp|#160);'i",
   "'&(iexcl|#161);'i",
   "'&(cent|#162);'i",
   "'&(pound|#163);'i",
   "'&(copy|#169);'i",
   "'&#(\d+);'e") ; // evaluate as php

$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)") ;

echo "<h1>$page_title</h1>" ;

function searchFiles ($dir,$search_string) {
 global $count, $search, $replace, $this_file,$highlight_colour,$highlight_text ;  
 chdir($dir) ;                              // change to dir passed as arg (else filetype doesn't work)
//## you SHOULD memorize the current directory you WERE in before chdir(), to set it back after processing !!! You SHOULD ;-))))
//
 $dir_handle = opendir('.') ;                    // open directory
 while (($file_name = readdir($dir_handle)) !== false) {  // cycle through each file in starting directory
   $file_type = filetype($file_name) ;              // test for files of type 'file'
   if ($file_type == 'file' && $file_name != "$this_file" && preg_match ("/\.(php|htm|html|txt|php3)/i", $file_name)) {  // exclude this file from search
     $file_contents = file ($file_name) ;
     $inside_php = 0 ;
     $no_search = 0 ;
     foreach ($file_contents as $line) { //# AS completely useless then, do a for on count($file_contents)
       $matches_array = array() ;

//#modify       $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
$stripped_line=strip_tags($line);
//#
//#modify below somewhere
// changing &amp; to & charcatre is done via :
// string html_entity_decode ( string string [, int quote_style [, string charset]])
// thus :
$newline=html_entity_decode($oldline); // all in one
//#
       if (preg_match ("/\<\?php\b/i", $line)) { $inside_php = 1 ; }    // find start of php
       if (preg_match ("/NOSEARCH/i", $line)) { $no_search = 1 ; }    // find start of no search
       if (preg_match ("/\b($search_string)/i",$stripped_line,$matches_array) && $inside_php == 0 && $no_search == 0) {  // search if not in php
         $count++ ;
         $matched_string = $matches_array[1] ;
         $stripped_line = preg_replace ("/\b$search_string/i","<b style=\"color:$highlight_text; background-color: $highlight_colour\">$matched_string</b>",$stripped_line) ;    // highlight matched string
         createResultsArray($dir,$file_name,$stripped_line) ;
       }
       if (preg_match ("/\?\>(\n|\r|)/i", $line)) { $inside_php = 0 ; }    // find end of php
       if (preg_match ("/ENDNOSEARCH/i", $line)) { $no_search = 0 ; }    // find end of no search
     }
   }
 }
}

function createResultsArray ($dir,$file_name,$stripped_line) {

 global $all_lines, $all_files, $max_chars ;
 
 array_push($all_lines,$stripped_line) ;            // add line to array
 array_push($all_files,$file_name) ;              // create relevant file link and add to seperate array
}

if ($search_string == '') {
 echo '<h2></h2><p style="text-align: center"></p>
' ;
} else {
 searchFiles('.',$search_string) ;

 echo "<h6>Results for: <u>$search_string</u>.<br />$count results were found.</h6>" ;
 echo '<table>' ;

 $count = 0 ;
//##here it's completely ridiculous :
//you use the (time greedy) foreach construct. Well, why not.
// you use an AS with no key->value representation. Basically, you just lose time for nothing
// you then compute the count and then array_ref=count-1=count before the statement before...
// It's "N'IMPORTE QUOI" !!!
//
// DO A : For ($i=0;$i<count($all_lines);$i++) {   !!!!!
//   $all_files[$$i]
// } // for

 foreach ($all_lines as $result_line) {
   $count = $count + 1 ;
   $array_ref = $count - 1 ;
   echo "<tr><td style=\"padding: 5px; background-color: $table_background\"><p><b>$count.</b> $result_line <a href=\"$all_files[$array_ref]\">More...</a></p></td></tr>" ;
 }

 echo '</table>' ;  

}

?>

//#FYI, some classical use of str* functions :
function GetChunk(&$i,&$zz,$contents,$deb,$fin,$debug=0) {
  if ($debug==1) echo "1 i=$i zz=$zz full = ".htmlspecialchars($contents[$i])."<BR>";
  $contents[$i]=substr($contents[$i],$zz); // the remaining
  if ($debug==1) echo "2 i=$i zz=$zz reste ".strlen($contents[$i])."= ".htmlspecialchars($contents[$i])."<BR>";
  while (($m=strpos($contents[$i],$deb))===false) { $i++; $zz=0; }
  $m=$m+strlen($deb);
  $n=$m;
  $locRes='';
  $l=strlen($fin);
  if ($debug==1) echo "2a l=$l search of ".htmlspecialchars($fin)." in ".htmlspecialchars($contents[$i])."<BR>";
  while (($n<strlen($contents[$i]))and((substr($contents[$i],$n,$l))<>$fin)) $n++;
  if ($debug==1) echo "2b n=$n m=$m l=$l locres=$locRes <BR>";
  if ($n==strlen($contents[$i])) {
    $locRes=substr($contents[$i],$m);
    $i++;
    $zz=0;
    $m=0;
    if ($debug==1) echo "2bJump i=$i zz=$zz full = ".htmlspecialchars($contents[$i])."<BR>";
    $n=strpos($contents[$i],$fin);
  }
  if ($debug==1) echo "2c n=$n m=$m l=$l locres=$locRes <BR>";
  if (!($n===false)) {
    $locRes.=substr($contents[$i],$m,$n-$m);
    if ($debug==1) echo "2d n=$n m=$m l=$l locres=$locRes <BR>";
    $zz=$zz+$n+1;
  } else { $locRes.=''; $zz=0; } // returns void is not found
  if ($debug==1) echo "3 i=$i zz=$zz $locRes<BR>";
  return($locRes);
} // GetChunk String Function
//#

Fatal error: Call to undefined function: html_entity_decode() in searcher.php on line 56
please fix this version you gave in your above comment so that it meets the suggestions of your comments

<?php

// You can change the colours of the search results using the variables below:

$page_title = '' ;      // Enter your own page title here.
$this_file = 'searcher.php' ;    // name of the search file to exclude from search
$table_background = '#CCCCCC' ;    // background colour of table (html accepted words or hex)
$highlight_colour = '#FFFFFF' ;      // background colour to highlight matched words
$highlight_text = '#003399' ;        // text colour to highlight words (should contrast with $highlight_colour

// DO NOT CHANGE ANYTHING BELOW THIS LINE UNLESS YOU KNOW WHAT YOU ARE DOING!

$all_lines = array() ;
$all_files = array() ;
$count = 0 ;
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
  "'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
  "'([\r\n])[\s]+'", // Strip out white space
  "'&(quot|#34);'i", // Replace html entities
  "'&(amp|#38);'i",
  "'&(lt|#60);'i",
  "'&(gt|#62);'i",
  "'&(nbsp|#160);'i",
  "'&(iexcl|#161);'i",
  "'&(cent|#162);'i",
  "'&(pound|#163);'i",
  "'&(copy|#169);'i",
  "'&#(\d+);'e") ; // evaluate as php

$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)") ;

echo "<h1>$page_title</h1>" ;

function searchFiles ($dir,$search_string) {
global $count, $search, $replace, $this_file,$highlight_colour,$highlight_text ;  
chdir($dir) ;                              // change to dir passed as arg (else filetype doesn't work)
//## you SHOULD memorize the current directory you WERE in before chdir(), to set it back after processing !!! You SHOULD ;-))))
//
$dir_handle = opendir('.') ;                    // open directory
while (($file_name = readdir($dir_handle)) !== false) {  // cycle through each file in starting directory
  $file_type = filetype($file_name) ;              // test for files of type 'file'
  if ($file_type == 'file' && $file_name != "$this_file" && preg_match ("/\.(php|htm|html|txt|php3)/i", $file_name)) {  // exclude this file from search
    $file_contents = file ($file_name) ;
    $inside_php = 0 ;
    $no_search = 0 ;
    foreach ($file_contents as $line) { //# AS completely useless then, do a for on count($file_contents)
      $matches_array = array() ;

//#modify       $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
$stripped_line=strip_tags($line);
//#
//#modify below somewhere
// changing &amp; to & charcatre is done via :
// string html_entity_decode ( string string [, int quote_style [, string charset]])
// thus :
$newline=html_entity_decode($oldline); // all in one
//#
      if (preg_match ("/\<\?php\b/i", $line)) { $inside_php = 1 ; }    // find start of php
      if (preg_match ("/NOSEARCH/i", $line)) { $no_search = 1 ; }    // find start of no search
      if (preg_match ("/\b($search_string)/i",$stripped_line,$matches_array) && $inside_php == 0 && $no_search == 0) {  // search if not in php
        $count++ ;
        $matched_string = $matches_array[1] ;
        $stripped_line = preg_replace ("/\b$search_string/i","<b style=\"color:$highlight_text; background-color: $highlight_colour\">$matched_string</b>",$stripped_line) ;    // highlight matched string
        createResultsArray($dir,$file_name,$stripped_line) ;
      }
      if (preg_match ("/\?\>(\n|\r|)/i", $line)) { $inside_php = 0 ; }    // find end of php
      if (preg_match ("/ENDNOSEARCH/i", $line)) { $no_search = 0 ; }    // find end of no search
    }
  }
}
}

function createResultsArray ($dir,$file_name,$stripped_line) {

global $all_lines, $all_files, $max_chars ;

array_push($all_lines,$stripped_line) ;            // add line to array
array_push($all_files,$file_name) ;              // create relevant file link and add to seperate array
}

if ($search_string == '') {
echo '<h2></h2><p style="text-align: center"></p>
' ;
} else {
searchFiles('.',$search_string) ;

echo "<h6>Results for: <u>$search_string</u>.<br />$count results were found.</h6>" ;
echo '<table>' ;

$count = 0 ;
//##here it's completely ridiculous :
//you use the (time greedy) foreach construct. Well, why not.
// you use an AS with no key->value representation. Basically, you just lose time for nothing
// you then compute the count and then array_ref=count-1=count before the statement before...
// It's "N'IMPORTE QUOI" !!!
//
// DO A : For ($i=0;$i<count($all_lines);$i++) {   !!!!!
//   $all_files[$$i]
// } // for

foreach ($all_lines as $result_line) {
  $count = $count + 1 ;
  $array_ref = $count - 1 ;
  echo "<tr><td style=\"padding: 5px; background-color: $table_background\"><p><b>$count.</b> $result_line <a href=\"$all_files[$array_ref]\">More...</a></p></td></tr>" ;
}

echo '</table>' ;  

}

?>
please fix this, i have no idea what all your comments do/mean because they //# and i dont know if im supposed to erase anything or add anything, please modify it and send it back this is important, i would pay the points required for URGENT but i dont have enough
ok, I'll try...
have you tried?
no, I can't really. I'm too busy. Your code is too far away from my programming style.

I have put in //# the comments I think are useful if you compare the way it's done in the script to what I suggest or what the PHP online documentation says about the functions I suggest.

regards,
at least tell me what //# means i have not seen it anywhere before

and do you know what caused the error i told you about, tell me that and ill try and figure the rest out myself
they are just comments (// is the C++ style comment also avaible in PHP and Delphi : it means comment until EoLn)
normal comments are /* */ (in PHP) or (* *) or { } (in Pascal)

the # is just there to make MY comments appear more clearly above the other comments.

I re-read your source commented by me, and I find my comments readable and understandable.

Anyway, it's up to you now. I made my best.

In a word as in one hundred, what I wrote id :
to strip tags from any string (even a BIG string like a full HTML source ;-), I would use two functions :
strip_tags : http://www.php.net/manual/en/function.strip-tags.php
html_entity_decode = http://www.php.net/manual/en/function.html-entity-decode.php

and to search reliably for elements I would not use regular expressiosn as I don't master them, I would use
strpos() = http://www.php.net/manual/en/function.strpos.php
substr() = http://www.php.net/manual/en/function.substr.php



regards
NB :
html_entity_decode  (PHP 4 >= 4.3.0)

html_entity_decode --  Convert all HTML entities to their applicable characters
at least tell me what //# means i have not seen it anywhere before

and do you know what caused the error i told you about, tell me that and ill try and figure the rest out myself
ignore that last post above this one, what i need to know now is, did you make any changes or just add comments?
the only line I modified is clearly marked 'modify' :

//#modify       $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
$stripped_line=strip_tags($line);
ok, thanks, ill try and figure this out, if no answer is accepted then maybe somebody else will look at it
still didnt work, I am going to give the points to the next person who comments
ASKER CERTIFIED SOLUTION
Avatar of 06jbarto
06jbarto

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
is anybody going to answer I am giving 5 minutes and then giving the points to whomever i feel
have fun with the points!! you owe me Bob