Link to home
Start Free TrialLog in
Avatar of peterbrowne
peterbrowne

asked on

PHP strip part of string if exists

I have built a blog in php, but have found that if a user copies text from a Word doc, then that text when pasted into the blog's editor will include Word code above the text;  The Word code starts with <!-- and ends with -->.  What I need to do is if the Word code is present, to strip it from the text that follows.  

The code that needs to be removed looks like:

<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Calibri","sans-serif"; mso-fareast-font-family:"Times New Roman"; mso-bidi-font-family:"Times New Roman";} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} .MsoPapDefault {mso-style-type:export-only; margin-bottom:10.0pt; line-height:115%;} @page Section1 {size:595.3pt 841.9pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} -->
Avatar of DanielIser
DanielIser

just use str_replace()
$string = word paste
$word_phrase = "<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Calibri","sans-serif"; mso-fareast-font-family:"Times New Roman"; mso-bidi-font-family:"Times New Roman";} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} .MsoPapDefault {mso-style-type:export-only; margin-bottom:10.0pt; line-height:115%;} @page Section1 {size:595.3pt 841.9pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} -->";
$string = str_replace($word_prhase, "", $string);
Avatar of Shinesh Premrajan
better option is to use a regular expression to clean up the mess, since str_replace probably will not be suitable for the dynamic content that is coming in the text.

$word_phrase = 'patt <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Calibri","sans-serif"; mso-fareast-font-family:"Times New Roman"; mso-bidi-font-family:"Times New Roman";} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} .MsoPapDefault {mso-style-type:export-only; margin-bottom:10.0pt; line-height:115%;} @page Section1 {size:595.3pt 841.9pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> this is ';

$string=preg_replace("/\<\!\-\-(.*)\-\-\>/i","=",$word_phrase);

Hope this helps
sorry that was the testing code, this is the correct one
$string=preg_replace("/\<\!\-\-(.*)\-\-\>/i","",$word_phrase);
If you don't care about any comments at all (including but not limited to the Word comments <!-- .. -->, then you can use this


$string="<!-- test here and there //--> some text I want to keep <!-- test here and there //--> <br/>Is it greedy?";
$regex = "#(<!--)(.*)?(-->)#Ue";
$output = preg_replace($regex,"",$string);

replace $string with your actual text variable
Avatar of peterbrowne

ASKER

Actually, the Word garbage may be part of the user's posting, so that this code precedes the actual text as typed in Word.  When the user pastes from Word, the intended posting and the code comes in together...so I need to strip out the code from the intended message, ie everthing before '-->'
The ? and /U in the regex causes preg_replace to be non-greedy.
If you have multiple comments, not having them causes everything from the start of the first comment to the end of last comment to disappear.
Not working...  I've used this:

$comment = mysql_real_escape_string($_POST['addnewcomment']);
$regex = "#(<!--)(.*)?(-->)#Ue";
$output = preg_replace($regex,"",$comment);

I still get:

<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:1; mso-generic-font-family:roman; mso-font-format:other; mso-font-pitch:variable; mso-font-signature:0 0 0 0 0 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-size:10.0pt; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt;} @page Section1 {size:595.3pt 841.9pt; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> Nullam ligula velit, ullamcorper eu tempor sed, feugiat vitae orci. Phasellus mi purus, ullamcorper in pellentesque at, imperdiet ac lacus. Praesent ultrices, mauris id euismod sollicitudin, nisl lectus consequat neque, ac tristique sem diam sit amet lorem.
ASKER CERTIFIED SOLUTION
Avatar of cyberkiwi
cyberkiwi
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
nO STILL DOESN'T WORK:

$comment = mysql_real_escape_string($_POST['addnewcomment']);
$regex = "#(<!--)(.*)?(-->)#Ue";
$comment = preg_replace($regex,"",$comment);

The only other place that 4comment is used is:

//add comment to database 'comment' table
if (isset($_POST['submit']))
{
      $query_comment = "INSERT INTO comment (comment_id,comment,comment_date,user_id,page_id)
          VALUES ('','$comment',NOW(),'$user_id','$page_id')";
      $result_comment = mysql_query($query_comment, $connection) or die(mysql_error());
}
try this attached code:

Hope this helps,
Addy
<?php
	
	function get_string_between($string, $start, $end)
	{
		$string = " ".$string; 
		$ini = strpos($string,$start); 
		if ($ini == 0) return ""; 
		$ini += strlen($start); 
		$len = strpos($string,$end,$ini) - $ini; 
		return substr($string,$ini,$len); 
	}
	
	$str='demofinr is this <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Calibri","sans-serif"; mso-fareast-font-family:"Times New Roman"; mso-bidi-font-family:"Times New Roman";} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} .MsoPapDefault {mso-style-type:export-only; margin-bottom:10.0pt; line-height:115%;} @page Section1 {size:595.3pt 841.9pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> while gone.';
	
	$between_str= get_string_between($str,'<!--','-->');
	echo str_replace('-->','',str_replace('<!--','',str_replace($between_str,'',$str)));
?>

Open in new window

I forgot about the multi-line issue.  The s option at the end here makes preg_replace work with multi-line strings

$regex = "#(<!--)(.*)?(-->)#Ues";
Actually your solution was correct.  I checked what was actually going into the database for these user inputs (comments) and '<!--' was actually going in as '<!--' and '-->' was becoming '-->'.

So, the following works:

$comment = mysql_real_escape_string($_POST['addnewcomment']);
$regex = "#(<!--)(.*)?(-->)#Ue";
$comment = preg_replace($regex,"",$comment);

Thanks for your help and others for your suggestions!!

Cheers,

Peter
Mmmm...looks like it's substituting here too:
<p>&lt;!--

and

--&gt;

so:

$comment = mysql_real_escape_string($_POST['addnewcomment']);
$regex = "#(<p>&lt;!--)(.*)?(--&gt;)#Ue";
$comment = preg_replace($regex,"",$comment);

Open in new window