?
Solved

stripping tags

Posted on 2003-03-17
20
Medium Priority
?
486 Views
Last Modified: 2013-11-19
i have a search engine for searching my site but it doesnt strip tags please see what you can do
<?php

// You can change the colours of the search results using the variables below:

$page_title = '' ;      // Enter your own page title here.
$this_file = 'searcher.php' ;    // name of the search file to exclude from search
$table_background = '#CCCCCC' ;    // background colour of table (html accepted words or hex)
$highlight_colour = '#FFFFFF' ;      // background colour to highlight matched words
$highlight_text = '#003399' ;        // text colour to highlight words (should contrast with $highlight_colour

// DO NOT CHANGE ANYTHING BELOW THIS LINE UNLESS YOU KNOW WHAT YOU ARE DOING!

$all_lines = array() ;
$all_files = array() ;
$count = 0 ;
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
    "'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
    "'([\r\n])[\s]+'", // Strip out white space
    "'&(quot|#34);'i", // Replace html entities
    "'&(amp|#38);'i",
    "'&(lt|#60);'i",
    "'&(gt|#62);'i",
    "'&(nbsp|#160);'i",
    "'&(iexcl|#161);'i",
    "'&(cent|#162);'i",
    "'&(pound|#163);'i",
    "'&(copy|#169);'i",
    "'&#(\d+);'e") ; // evaluate as php

$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)") ;

echo "<h1>$page_title</h1>" ;

function searchFiles ($dir,$search_string) {
  global $count, $search, $replace, $this_file,$highlight_colour,$highlight_text ;  
  chdir($dir) ;                              // change to dir passed as arg (else filetype doesn't work)
  $dir_handle = opendir('.') ;                    // open directory
  while (($file_name = readdir($dir_handle)) !== false) {  // cycle through each file in starting directory
    $file_type = filetype($file_name) ;              // test for files of type 'file'
    if ($file_type == 'file' && $file_name != "$this_file" && preg_match ("/\.(php|htm|html|txt|php3)/i", $file_name)) {  // exclude this file from search
      $file_contents = file ($file_name) ;
      $inside_php = 0 ;
      $no_search = 0 ;
      foreach ($file_contents as $line) {
        $matches_array = array() ;
        $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
        if (preg_match ("/\<\?php\b/i", $line)) { $inside_php = 1 ; }    // find start of php
        if (preg_match ("/NOSEARCH/i", $line)) { $no_search = 1 ; }    // find start of no search
        if (preg_match ("/\b($search_string)/i",$stripped_line,$matches_array) && $inside_php == 0 && $no_search == 0) {  // search if not in php
          $count++ ;
          $matched_string = $matches_array[1] ;
          $stripped_line = preg_replace ("/\b$search_string/i","<b style=\"color:$highlight_text; background-color: $highlight_colour\">$matched_string</b>",$stripped_line) ;    // highlight matched string
          createResultsArray($dir,$file_name,$stripped_line) ;
        }
        if (preg_match ("/\?\>(\n|\r|)/i", $line)) { $inside_php = 0 ; }    // find end of php
        if (preg_match ("/ENDNOSEARCH/i", $line)) { $no_search = 0 ; }    // find end of no search
      }
    }
  }
}

function createResultsArray ($dir,$file_name,$stripped_line) {

  global $all_lines, $all_files, $max_chars ;
 
  array_push($all_lines,$stripped_line) ;            // add line to array
  array_push($all_files,$file_name) ;              // create relevant file link and add to seperate array
}

if ($search_string == '') {
  echo '<h2></h2><p style="text-align: center"></p>
' ;
} else {
  searchFiles('.',$search_string) ;

  echo "<h6>Results for: <u>$search_string</u>.<br />$count results were found.</h6>" ;
  echo '<table>' ;

  $count = 0 ;
  foreach ($all_lines as $result_line) {
    $count = $count + 1 ;
    $array_ref = $count - 1 ;
    echo "<tr><td style=\"padding: 5px; background-color: $table_background\"><p><b>$count.</b> $result_line <a href=\"$all_files[$array_ref]\">More...</a></p></td></tr>" ;
  }

  echo '</table>' ;  

}

?>
0
Comment
Question by:bartonjo2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 7
  • 2
20 Comments
 
LVL 15

Expert Comment

by:VGR
ID: 8153803
I hate preg(), eregi(), eregreplace() I don't find them reliable

I would use strpos(), substr() and the like

or strip_tags() ;-))
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8156175
hey VGR please show how you you use your suggestion in my script because i dont the usage or where to use the tags you suggested.. Thanks
0
 
LVL 15

Expert Comment

by:VGR
ID: 8157426
Examples are here :
http://www.php.net/manual/en/function.strip-tags.php
http://www.php.net/manual/en/function.strpos.php
http://www.php.net/manual/en/function.substr.php

http://www.php.net/manual/en/function.html-entity-decode.php


RTFM

your code parts :
(pseudo-code, not tested)

<?php

// You can change the colours of the search results using the variables below:

$page_title = '' ;      // Enter your own page title here.
$this_file = 'searcher.php' ;    // name of the search file to exclude from search
$table_background = '#CCCCCC' ;    // background colour of table (html accepted words or hex)
$highlight_colour = '#FFFFFF' ;      // background colour to highlight matched words
$highlight_text = '#003399' ;        // text colour to highlight words (should contrast with $highlight_colour

// DO NOT CHANGE ANYTHING BELOW THIS LINE UNLESS YOU KNOW WHAT YOU ARE DOING!

$all_lines = array() ;
$all_files = array() ;
$count = 0 ;
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
   "'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
   "'([\r\n])[\s]+'", // Strip out white space
   "'&(quot|#34);'i", // Replace html entities
   "'&(amp|#38);'i",
   "'&(lt|#60);'i",
   "'&(gt|#62);'i",
   "'&(nbsp|#160);'i",
   "'&(iexcl|#161);'i",
   "'&(cent|#162);'i",
   "'&(pound|#163);'i",
   "'&(copy|#169);'i",
   "'&#(\d+);'e") ; // evaluate as php

$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)") ;

echo "<h1>$page_title</h1>" ;

function searchFiles ($dir,$search_string) {
 global $count, $search, $replace, $this_file,$highlight_colour,$highlight_text ;  
 chdir($dir) ;                              // change to dir passed as arg (else filetype doesn't work)
//## you SHOULD memorize the current directory you WERE in before chdir(), to set it back after processing !!! You SHOULD ;-))))
//
 $dir_handle = opendir('.') ;                    // open directory
 while (($file_name = readdir($dir_handle)) !== false) {  // cycle through each file in starting directory
   $file_type = filetype($file_name) ;              // test for files of type 'file'
   if ($file_type == 'file' && $file_name != "$this_file" && preg_match ("/\.(php|htm|html|txt|php3)/i", $file_name)) {  // exclude this file from search
     $file_contents = file ($file_name) ;
     $inside_php = 0 ;
     $no_search = 0 ;
     foreach ($file_contents as $line) { //# AS completely useless then, do a for on count($file_contents)
       $matches_array = array() ;

//#modify       $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
$stripped_line=strip_tags($line);
//#
//#modify below somewhere
// changing &amp; to & charcatre is done via :
// string html_entity_decode ( string string [, int quote_style [, string charset]])
// thus :
$newline=html_entity_decode($oldline); // all in one
//#
       if (preg_match ("/\<\?php\b/i", $line)) { $inside_php = 1 ; }    // find start of php
       if (preg_match ("/NOSEARCH/i", $line)) { $no_search = 1 ; }    // find start of no search
       if (preg_match ("/\b($search_string)/i",$stripped_line,$matches_array) && $inside_php == 0 && $no_search == 0) {  // search if not in php
         $count++ ;
         $matched_string = $matches_array[1] ;
         $stripped_line = preg_replace ("/\b$search_string/i","<b style=\"color:$highlight_text; background-color: $highlight_colour\">$matched_string</b>",$stripped_line) ;    // highlight matched string
         createResultsArray($dir,$file_name,$stripped_line) ;
       }
       if (preg_match ("/\?\>(\n|\r|)/i", $line)) { $inside_php = 0 ; }    // find end of php
       if (preg_match ("/ENDNOSEARCH/i", $line)) { $no_search = 0 ; }    // find end of no search
     }
   }
 }
}

function createResultsArray ($dir,$file_name,$stripped_line) {

 global $all_lines, $all_files, $max_chars ;
 
 array_push($all_lines,$stripped_line) ;            // add line to array
 array_push($all_files,$file_name) ;              // create relevant file link and add to seperate array
}

if ($search_string == '') {
 echo '<h2></h2><p style="text-align: center"></p>
' ;
} else {
 searchFiles('.',$search_string) ;

 echo "<h6>Results for: <u>$search_string</u>.<br />$count results were found.</h6>" ;
 echo '<table>' ;

 $count = 0 ;
//##here it's completely ridiculous :
//you use the (time greedy) foreach construct. Well, why not.
// you use an AS with no key->value representation. Basically, you just lose time for nothing
// you then compute the count and then array_ref=count-1=count before the statement before...
// It's "N'IMPORTE QUOI" !!!
//
// DO A : For ($i=0;$i<count($all_lines);$i++) {   !!!!!
//   $all_files[$$i]
// } // for

 foreach ($all_lines as $result_line) {
   $count = $count + 1 ;
   $array_ref = $count - 1 ;
   echo "<tr><td style=\"padding: 5px; background-color: $table_background\"><p><b>$count.</b> $result_line <a href=\"$all_files[$array_ref]\">More...</a></p></td></tr>" ;
 }

 echo '</table>' ;  

}

?>

//#FYI, some classical use of str* functions :
function GetChunk(&$i,&$zz,$contents,$deb,$fin,$debug=0) {
  if ($debug==1) echo "1 i=$i zz=$zz full = ".htmlspecialchars($contents[$i])."<BR>";
  $contents[$i]=substr($contents[$i],$zz); // the remaining
  if ($debug==1) echo "2 i=$i zz=$zz reste ".strlen($contents[$i])."= ".htmlspecialchars($contents[$i])."<BR>";
  while (($m=strpos($contents[$i],$deb))===false) { $i++; $zz=0; }
  $m=$m+strlen($deb);
  $n=$m;
  $locRes='';
  $l=strlen($fin);
  if ($debug==1) echo "2a l=$l search of ".htmlspecialchars($fin)." in ".htmlspecialchars($contents[$i])."<BR>";
  while (($n<strlen($contents[$i]))and((substr($contents[$i],$n,$l))<>$fin)) $n++;
  if ($debug==1) echo "2b n=$n m=$m l=$l locres=$locRes <BR>";
  if ($n==strlen($contents[$i])) {
    $locRes=substr($contents[$i],$m);
    $i++;
    $zz=0;
    $m=0;
    if ($debug==1) echo "2bJump i=$i zz=$zz full = ".htmlspecialchars($contents[$i])."<BR>";
    $n=strpos($contents[$i],$fin);
  }
  if ($debug==1) echo "2c n=$n m=$m l=$l locres=$locRes <BR>";
  if (!($n===false)) {
    $locRes.=substr($contents[$i],$m,$n-$m);
    if ($debug==1) echo "2d n=$n m=$m l=$l locres=$locRes <BR>";
    $zz=$zz+$n+1;
  } else { $locRes.=''; $zz=0; } // returns void is not found
  if ($debug==1) echo "3 i=$i zz=$zz $locRes<BR>";
  return($locRes);
} // GetChunk String Function
//#

0
Tutorial: Introduction to Managing a Linux Server

In this tutorial on systemd, we will explore:
-OS/Distro Adoption
-chkconfig and Other Legacy Commands
-Summary and Key Commands

 
LVL 3

Author Comment

by:bartonjo2
ID: 8159464
Fatal error: Call to undefined function: html_entity_decode() in searcher.php on line 56
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8159563
please fix this version you gave in your above comment so that it meets the suggestions of your comments

<?php

// You can change the colours of the search results using the variables below:

$page_title = '' ;      // Enter your own page title here.
$this_file = 'searcher.php' ;    // name of the search file to exclude from search
$table_background = '#CCCCCC' ;    // background colour of table (html accepted words or hex)
$highlight_colour = '#FFFFFF' ;      // background colour to highlight matched words
$highlight_text = '#003399' ;        // text colour to highlight words (should contrast with $highlight_colour

// DO NOT CHANGE ANYTHING BELOW THIS LINE UNLESS YOU KNOW WHAT YOU ARE DOING!

$all_lines = array() ;
$all_files = array() ;
$count = 0 ;
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
  "'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
  "'([\r\n])[\s]+'", // Strip out white space
  "'&(quot|#34);'i", // Replace html entities
  "'&(amp|#38);'i",
  "'&(lt|#60);'i",
  "'&(gt|#62);'i",
  "'&(nbsp|#160);'i",
  "'&(iexcl|#161);'i",
  "'&(cent|#162);'i",
  "'&(pound|#163);'i",
  "'&(copy|#169);'i",
  "'&#(\d+);'e") ; // evaluate as php

$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)") ;

echo "<h1>$page_title</h1>" ;

function searchFiles ($dir,$search_string) {
global $count, $search, $replace, $this_file,$highlight_colour,$highlight_text ;  
chdir($dir) ;                              // change to dir passed as arg (else filetype doesn't work)
//## you SHOULD memorize the current directory you WERE in before chdir(), to set it back after processing !!! You SHOULD ;-))))
//
$dir_handle = opendir('.') ;                    // open directory
while (($file_name = readdir($dir_handle)) !== false) {  // cycle through each file in starting directory
  $file_type = filetype($file_name) ;              // test for files of type 'file'
  if ($file_type == 'file' && $file_name != "$this_file" && preg_match ("/\.(php|htm|html|txt|php3)/i", $file_name)) {  // exclude this file from search
    $file_contents = file ($file_name) ;
    $inside_php = 0 ;
    $no_search = 0 ;
    foreach ($file_contents as $line) { //# AS completely useless then, do a for on count($file_contents)
      $matches_array = array() ;

//#modify       $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
$stripped_line=strip_tags($line);
//#
//#modify below somewhere
// changing &amp; to & charcatre is done via :
// string html_entity_decode ( string string [, int quote_style [, string charset]])
// thus :
$newline=html_entity_decode($oldline); // all in one
//#
      if (preg_match ("/\<\?php\b/i", $line)) { $inside_php = 1 ; }    // find start of php
      if (preg_match ("/NOSEARCH/i", $line)) { $no_search = 1 ; }    // find start of no search
      if (preg_match ("/\b($search_string)/i",$stripped_line,$matches_array) && $inside_php == 0 && $no_search == 0) {  // search if not in php
        $count++ ;
        $matched_string = $matches_array[1] ;
        $stripped_line = preg_replace ("/\b$search_string/i","<b style=\"color:$highlight_text; background-color: $highlight_colour\">$matched_string</b>",$stripped_line) ;    // highlight matched string
        createResultsArray($dir,$file_name,$stripped_line) ;
      }
      if (preg_match ("/\?\>(\n|\r|)/i", $line)) { $inside_php = 0 ; }    // find end of php
      if (preg_match ("/ENDNOSEARCH/i", $line)) { $no_search = 0 ; }    // find end of no search
    }
  }
}
}

function createResultsArray ($dir,$file_name,$stripped_line) {

global $all_lines, $all_files, $max_chars ;

array_push($all_lines,$stripped_line) ;            // add line to array
array_push($all_files,$file_name) ;              // create relevant file link and add to seperate array
}

if ($search_string == '') {
echo '<h2></h2><p style="text-align: center"></p>
' ;
} else {
searchFiles('.',$search_string) ;

echo "<h6>Results for: <u>$search_string</u>.<br />$count results were found.</h6>" ;
echo '<table>' ;

$count = 0 ;
//##here it's completely ridiculous :
//you use the (time greedy) foreach construct. Well, why not.
// you use an AS with no key->value representation. Basically, you just lose time for nothing
// you then compute the count and then array_ref=count-1=count before the statement before...
// It's "N'IMPORTE QUOI" !!!
//
// DO A : For ($i=0;$i<count($all_lines);$i++) {   !!!!!
//   $all_files[$$i]
// } // for

foreach ($all_lines as $result_line) {
  $count = $count + 1 ;
  $array_ref = $count - 1 ;
  echo "<tr><td style=\"padding: 5px; background-color: $table_background\"><p><b>$count.</b> $result_line <a href=\"$all_files[$array_ref]\">More...</a></p></td></tr>" ;
}

echo '</table>' ;  

}

?>
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8160848
please fix this, i have no idea what all your comments do/mean because they //# and i dont know if im supposed to erase anything or add anything, please modify it and send it back this is important, i would pay the points required for URGENT but i dont have enough
0
 
LVL 15

Expert Comment

by:VGR
ID: 8160938
ok, I'll try...
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8167097
have you tried?
0
 
LVL 15

Expert Comment

by:VGR
ID: 8168354
no, I can't really. I'm too busy. Your code is too far away from my programming style.

I have put in //# the comments I think are useful if you compare the way it's done in the script to what I suggest or what the PHP online documentation says about the functions I suggest.

regards,
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8168695
at least tell me what //# means i have not seen it anywhere before

and do you know what caused the error i told you about, tell me that and ill try and figure the rest out myself
0
 
LVL 15

Expert Comment

by:VGR
ID: 8168771
they are just comments (// is the C++ style comment also avaible in PHP and Delphi : it means comment until EoLn)
normal comments are /* */ (in PHP) or (* *) or { } (in Pascal)

the # is just there to make MY comments appear more clearly above the other comments.

I re-read your source commented by me, and I find my comments readable and understandable.

Anyway, it's up to you now. I made my best.

In a word as in one hundred, what I wrote id :
to strip tags from any string (even a BIG string like a full HTML source ;-), I would use two functions :
strip_tags : http://www.php.net/manual/en/function.strip-tags.php
html_entity_decode = http://www.php.net/manual/en/function.html-entity-decode.php

and to search reliably for elements I would not use regular expressiosn as I don't master them, I would use
strpos() = http://www.php.net/manual/en/function.strpos.php
substr() = http://www.php.net/manual/en/function.substr.php



regards
0
 
LVL 15

Expert Comment

by:VGR
ID: 8168773
NB :
html_entity_decode  (PHP 4 >= 4.3.0)

html_entity_decode --  Convert all HTML entities to their applicable characters
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8168784
at least tell me what //# means i have not seen it anywhere before

and do you know what caused the error i told you about, tell me that and ill try and figure the rest out myself
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8168795
ignore that last post above this one, what i need to know now is, did you make any changes or just add comments?
0
 
LVL 15

Expert Comment

by:VGR
ID: 8168808
the only line I modified is clearly marked 'modify' :

//#modify       $stripped_line = preg_replace ($search, $replace, $line) ;      // strip $line of all html elements
$stripped_line=strip_tags($line);
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8168838
ok, thanks, ill try and figure this out, if no answer is accepted then maybe somebody else will look at it
0
 

Expert Comment

by:06jbarto
ID: 8288271
still didnt work, I am going to give the points to the next person who comments
0
 

Accepted Solution

by:
06jbarto earned 260 total points
ID: 8288283
well thats what i think you should do with the points youre really frustrating Josh, you dont accept any of these guys great answers, see you in class tomorrow
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8292338
is anybody going to answer I am giving 5 minutes and then giving the points to whomever i feel
0
 
LVL 3

Author Comment

by:bartonjo2
ID: 8292376
have fun with the points!! you owe me Bob
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

These days socially coordinated efforts have turned into a critical requirement for enterprises.
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question