peterbrowne
asked on
PHP strip part of string if exists
I have built a blog in php, but have found that if a user copies text from a Word doc, then that text when pasted into the blog's editor will include Word code above the text; The Word code starts with <!-- and ends with -->. What I need to do is if the Word code is present, to strip it from the text that follows.
The code that needs to be removed looks like:
<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:ro man; mso-font-pitch:variable; mso-font-signature:-161061 1985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:sw iss; mso-font-pitch:variable; mso-font-signature:-161061 1985 1073750139 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orpha n; font-size:12.0pt; font-family:"Calibri","san s-serif"; mso-fareast-font-family:"T imes New Roman"; mso-bidi-font-family:"Time s New Roman";} .MsoChpDefault {mso-style-type:export-onl y; mso-default-props:yes; mso-ascii-font-family:Cali bri; mso-ascii-theme-font:minor -latin; mso-hansi-font-family:Cali bri; mso-hansi-theme-font:minor -latin; mso-bidi-font-family:"Time s New Roman"; mso-bidi-theme-font:minor- bidi; mso-fareast-language:EN-US ;} .MsoPapDefault {mso-style-type:export-onl y; margin-bottom:10.0pt; line-height:115%;} @page Section1 {size:595.3pt 841.9pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} -->
The code that needs to be removed looks like:
<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:ro
better option is to use a regular expression to clean up the mess, since str_replace probably will not be suitable for the dynamic content that is coming in the text.
$word_phrase = 'patt <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:ro man; mso-font-pitch:variable; mso-font-signature:-161061 1985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:sw iss; mso-font-pitch:variable; mso-font-signature:-161061 1985 1073750139 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orpha n; font-size:12.0pt; font-family:"Calibri","san s-serif"; mso-fareast-font-family:"T imes New Roman"; mso-bidi-font-family:"Time s New Roman";} .MsoChpDefault {mso-style-type:export-onl y; mso-default-props:yes; mso-ascii-font-family:Cali bri; mso-ascii-theme-font:minor -latin; mso-hansi-font-family:Cali bri; mso-hansi-theme-font:minor -latin; mso-bidi-font-family:"Time s New Roman"; mso-bidi-theme-font:minor- bidi; mso-fareast-language:EN-US ;} .MsoPapDefault {mso-style-type:export-onl y; margin-bottom:10.0pt; line-height:115%;} @page Section1 {size:595.3pt 841.9pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> this is ';
$string=preg_replace("/\<\ !\-\-(.*)\ -\-\>/i"," =",$word_p hrase);
Hope this helps
$word_phrase = 'patt <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:ro
$string=preg_replace("/\<\
Hope this helps
sorry that was the testing code, this is the correct one
$string=preg_replace("/\<\ !\-\-(.*)\ -\-\>/i"," ",$word_ph rase);
$string=preg_replace("/\<\
If you don't care about any comments at all (including but not limited to the Word comments <!-- .. -->, then you can use this
$string="<!-- test here and there //--> some text I want to keep <!-- test here and there //--> <br/>Is it greedy?";
$regex = "#(<!--)(.*)?(-->)#Ue";
$output = preg_replace($regex,"",$st ring);
replace $string with your actual text variable
$string="<!-- test here and there //--> some text I want to keep <!-- test here and there //--> <br/>Is it greedy?";
$regex = "#(<!--)(.*)?(-->)#Ue";
$output = preg_replace($regex,"",$st
replace $string with your actual text variable
ASKER
Actually, the Word garbage may be part of the user's posting, so that this code precedes the actual text as typed in Word. When the user pastes from Word, the intended posting and the code comes in together...so I need to strip out the code from the intended message, ie everthing before '-->'
The ? and /U in the regex causes preg_replace to be non-greedy.
If you have multiple comments, not having them causes everything from the start of the first comment to the end of last comment to disappear.
If you have multiple comments, not having them causes everything from the start of the first comment to the end of last comment to disappear.
ASKER
Not working... I've used this:
$comment = mysql_real_escape_string($ _POST['add newcomment ']);
$regex = "#(<!--)(.*)?(-->)#Ue";
$output = preg_replace($regex,"",$co mment);
I still get:
<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:1; mso-generic-font-family:ro man; mso-font-format:other; mso-font-pitch:variable; mso-font-signature:0 0 0 0 0 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orpha n; font-size:12.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"T imes New Roman";} .MsoChpDefault {mso-style-type:export-onl y; mso-default-props:yes; font-size:10.0pt; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt; } @page Section1 {size:595.3pt 841.9pt; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> Nullam ligula velit, ullamcorper eu tempor sed, feugiat vitae orci. Phasellus mi purus, ullamcorper in pellentesque at, imperdiet ac lacus. Praesent ultrices, mauris id euismod sollicitudin, nisl lectus consequat neque, ac tristique sem diam sit amet lorem.
$comment = mysql_real_escape_string($
$regex = "#(<!--)(.*)?(-->)#Ue";
$output = preg_replace($regex,"",$co
I still get:
<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:1; mso-generic-font-family:ro
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
nO STILL DOESN'T WORK:
$comment = mysql_real_escape_string($ _POST['add newcomment ']);
$regex = "#(<!--)(.*)?(-->)#Ue";
$comment = preg_replace($regex,"",$co mment);
The only other place that 4comment is used is:
//add comment to database 'comment' table
if (isset($_POST['submit']))
{
$query_comment = "INSERT INTO comment (comment_id,comment,commen t_date,use r_id,page_ id)
VALUES ('','$comment',NOW(),'$use r_id','$pa ge_id')";
$result_comment = mysql_query($query_comment , $connection) or die(mysql_error());
}
$comment = mysql_real_escape_string($
$regex = "#(<!--)(.*)?(-->)#Ue";
$comment = preg_replace($regex,"",$co
The only other place that 4comment is used is:
//add comment to database 'comment' table
if (isset($_POST['submit']))
{
$query_comment = "INSERT INTO comment (comment_id,comment,commen
VALUES ('','$comment',NOW(),'$use
$result_comment = mysql_query($query_comment
}
try this attached code:
Hope this helps,
Addy
Hope this helps,
Addy
<?php
function get_string_between($string, $start, $end)
{
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$str='demofinr is this <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Calibri","sans-serif"; mso-fareast-font-family:"Times New Roman"; mso-bidi-font-family:"Times New Roman";} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} .MsoPapDefault {mso-style-type:export-only; margin-bottom:10.0pt; line-height:115%;} @page Section1 {size:595.3pt 841.9pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> while gone.';
$between_str= get_string_between($str,'<!--','-->');
echo str_replace('-->','',str_replace('<!--','',str_replace($between_str,'',$str)));
?>
I forgot about the multi-line issue. The s option at the end here makes preg_replace work with multi-line strings
$regex = "#(<!--)(.*)?(-->)#Ues";
$regex = "#(<!--)(.*)?(-->)#Ues";
ASKER
Actually your solution was correct. I checked what was actually going into the database for these user inputs (comments) and '<!--' was actually going in as '<!--' and '-->' was becoming '-->'.
So, the following works:
$comment = mysql_real_escape_string($ _POST['add newcomment ']);
$regex = "#(<!--)(.*)?(-->)#Ue";
$comment = preg_replace($regex,"",$co mment);
Thanks for your help and others for your suggestions!!
Cheers,
Peter
So, the following works:
$comment = mysql_real_escape_string($
$regex = "#(<!--)(.*)?(-->)#Ue";
$comment = preg_replace($regex,"",$co
Thanks for your help and others for your suggestions!!
Cheers,
Peter
ASKER
Mmmm...looks like it's substituting here too:
<p><!--
and
-->
so:
$comment = mysql_real_escape_string($_POST['addnewcomment']);
$regex = "#(<p><!--)(.*)?(-->)#Ue";
$comment = preg_replace($regex,"",$comment);
$string = word paste
$word_phrase = "<!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:ro
$string = str_replace($word_prhase, "", $string);