PHP Function to break sentence after X characters and add dots...

Hi,

I'm struggling with a function to break a sentence after X characters. What i had worked oke, until the broken word contains special characters like:
ö ë & "

If the X character was one of the UTF-8 chars of the special char, strange chars would be shown.

How to build a function that also prevents breaking the output of special chars.

Thanks!!
peps03Asked:
Who is Participating?
 
Ray PaseurConnect With a Mentor Commented:
This seems to test out OK.  The function is at line 15; the test cases are at the end.

<?php // RAY_temp_peps03.php
error_reporting(E_ALL);
echo '<pre>';

// MAN PAGE: http://www.joelonsoftware.com/articles/Unicode.html
// MAN PAGE: http://www.columbia.edu/kermit/utf8-t1.html
// MAN PAGE: http://www.utf-8.com/
// MAN PAGE: http://www.unicode.org/ucd/
// MAN PAGE: http://www.unicode.org/faq/line_breaking.html
// MAN PAGE: http://www.unicode.org/reports/tr14/
// MAN PAGE: http://www.php.net/manual/en/mbstring.supported-encodings.php
// MAN PAGE: http://www.php.net/manual/en/function.mb-split.php#99851

// MAKE A STRING FRAGMENT OF THE CORRECT LENGTH
function mb_teaser_fragment($str, $len, $tail=NULL)
{
    $arr = preg_split('/(?<!^)(?!$)/u', $str);
    $arr = array_slice($arr,0,$len);
    return implode(NULL, $arr) . $tail;
}

// GET THE TEST DATA SET
$url = 'http://pjpn.eu/shorten-chars/';
$raw = file_get_contents($url);

// REMOVE THE EXTRANEOUS STUFF
$raw = str_replace('<br><br><br><br>', '|', $raw);
$raw = strip_tags($raw);
$raw = str_replace('Untitled Document', NULL, $raw);
$raw = trim($raw);

// ENABLE BROWSER DISPLAY
echo '<meta charset="utf-8" />' . PHP_EOL;

// EXPLODE THE MULTI-BYTE STRING INTO AN ARRAY OF TEST STRINGS
$arr = explode('|', $raw);

// ADD A NON-MULTI-BYTE STRING
$arr[] = 'Food is delicious';

// SHOW THE COMPARISONS OF THE LENGTH WITH DIFFERENT CHARACTER SETS
mb_internal_encoding("UTF-8");
foreach ($arr as $utf)
{
    $cnt =    strlen($utf);
    $mbt = mb_strlen($utf);
    echo PHP_EOL . $utf . " StrLen=$cnt AND Mb_Strlen=$mbt";
}

// MAKE SOME TESTS
$utf = $arr[0];
echo PHP_EOL . $utf;
echo PHP_EOL . mb_teaser_fragment($utf,  1);
echo PHP_EOL . mb_teaser_fragment($utf,  2);
echo PHP_EOL . mb_teaser_fragment($utf,  3);
echo PHP_EOL . mb_teaser_fragment($utf,  4);
echo PHP_EOL . mb_teaser_fragment($utf,  5);
echo PHP_EOL . mb_teaser_fragment($utf,  6);
echo PHP_EOL . mb_teaser_fragment($utf,  7);
echo PHP_EOL . mb_teaser_fragment($utf,  8);
echo PHP_EOL . mb_teaser_fragment($utf,  9);
echo PHP_EOL . mb_teaser_fragment($utf, 10);
echo PHP_EOL . mb_teaser_fragment($utf, 11);
echo PHP_EOL . mb_teaser_fragment($utf, 12);
echo PHP_EOL . mb_teaser_fragment($utf, 13);
echo PHP_EOL;

$utf = $arr[2];
echo PHP_EOL . $utf;
echo PHP_EOL . mb_teaser_fragment($utf,  1);
echo PHP_EOL . mb_teaser_fragment($utf,  2);
echo PHP_EOL . mb_teaser_fragment($utf,  3);
echo PHP_EOL . mb_teaser_fragment($utf,  4);
echo PHP_EOL . mb_teaser_fragment($utf,  5);
echo PHP_EOL . mb_teaser_fragment($utf,  6);
echo PHP_EOL . mb_teaser_fragment($utf,  7);
echo PHP_EOL . mb_teaser_fragment($utf,  8);
echo PHP_EOL . mb_teaser_fragment($utf,  9);
echo PHP_EOL . mb_teaser_fragment($utf, 10);
echo PHP_EOL . mb_teaser_fragment($utf, 11);
echo PHP_EOL . mb_teaser_fragment($utf, 12);
echo PHP_EOL . mb_teaser_fragment($utf, 13);
echo PHP_EOL . mb_teaser_fragment($utf, 14);
echo PHP_EOL . mb_teaser_fragment($utf, 15);
echo PHP_EOL . mb_teaser_fragment($utf, 16);
echo PHP_EOL . mb_teaser_fragment($utf, 17);
echo PHP_EOL . mb_teaser_fragment($utf, 18);
echo PHP_EOL . mb_teaser_fragment($utf, 19);
echo PHP_EOL . mb_teaser_fragment($utf, 20);

Open in new window

Best regards, ~Ray
0
 
sivagnanam chandrakanthTechnical LeadCommented:
Try this

<?php
$string= "Remove a counter from the row ö ë & specified by key at the granularity specified by column_path. Note that all the values in column_path besides column_path.column_family are truly optional: you can remove the entire row by just specifying the ColumnFamily, or you can remove a SuperColumn or a single Column by specifying those levels too. Note that counters have limited support for deletes: if you remove a counter, you must wait to issue any following update until the delete has reached all the nodes and all of them have been fully compacted.";
$parts = str_split($string, $split_length = 10);
echo "<pre>";
print_r($parts);

?>

Open in new window

0
 
peps03Author Commented:
Thx. But what i actually meant is a function in which i can enter a string, and an amount of characters, say 20, like:

$input = 'hello there, i am trying to get this working';
echo shorten($input, 20);
Output:
Hello there, i am tr..

But it should also work if the 20th character is a special char like: ë or &amp;
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

 
sivagnanam chandrakanthTechnical LeadCommented:
ok..Try this

<?php
$string= "Removö ë & counter from the row ö ë & specified by key at the granularity specified by column_path. Note that all the values in column_path besides column_path.column_family are truly optional: you can remove the entire row by just specifying the ColumnFamily, or you can remove a SuperColumn or a single Column by specifying those levels too. Note that counters have limited support for deletes: if you remove a counter, you must wait to issue any following update until the delete has reached all the nodes and all of them have been fully compacted.";

echo shorten($string,0,10);

function shorten($str,$start,$len){

return substr($str,$start,$len);

}
?>

Open in new window

0
 
peps03Author Commented:
Thanks.

I mean when this happens:

$string= "Frlkrrtï"; // = ï

echo shorten(utf8_decode($string),0,8);

function shorten($str,$start,$len){

return substr($str,$start,$len);

}

Open in new window


change 8 to 9 and it works.
but when it is 8, it doesn't.

and ï is saved in the db like ï
0
 
Lukasz ChmielewskiCommented:
0
 
Ray PaseurCommented:
What you're referring to is called a "teaser fragment" in publishing.  You may need to make multi-byte function changes here, but this works for me most of the time.
<?php // RAY_teaser_fragment.php
error_reporting(E_ALL);


// CREATE A TEASER FRAGMENT HEADLINE
// RETURN FIRST FEW WHOLE WORDS FOLLOWED BY ELLIPSES
// WITH A LINK TO THE FULL ARTICLE
// $length IS MINIMUM TRUNCATION CHARACTER COUNT


function teaser_fragment($text, $length=32, $url='#', $delim='|||')
{
    // IF TRUNCATION IS NEEDED
    if (strlen($text) > $length)
    {
        // IF TRUNCATION IS NEEDED, BREAK STRING APART
        $t = wordwrap($text, $length, $delim);
        $a = explode($delim, $t);
        $z = '...';
    }
    // IF TRUNCATION IS NOT NEEDED
    else
    {
        $a[0] = $text;
        $z = NULL;
    }

    // CONSTRUCT THE FRAGMENT WITH THE LINK AND ADD ELLIPSIS (LINK) TO THE END
    $teaser
    = '<a target="_blank" href="'
    . $url
    . '">'
    . $a[0]
    . $z
    . '</a>'
    ;
    return $teaser;
}



// USE CASES
echo "<pre>";
echo PHP_EOL;
echo "1...5...10...15...20...25...30...35...40...45..." . PHP_EOL;
echo teaser_fragment('Now is the time for all good men to come to the aid of their party');

echo PHP_EOL;
echo teaser_fragment('Now is the time for all good men to come to the aid of their party', 300);

echo PHP_EOL;
echo teaser_fragment('Now is the time for all good men to come to the aid of their party', 15, 'http://en.wikipedia.org/wiki/Filler_text');

Open in new window

HTH, ~Ray
0
 
peps03Author Commented:
is there no function to count how many bytes a certain special character consists of?
so you can add this to the set length of the of the requested output.
(without generating longer output, but just to prevent special characters that consist of multiple bytes from getting broken)
0
 
Ray PaseurCommented:
Have you tried strlen()?  Also, if you can post a link (please do not post the text) to a sample document we might be able to give you an example or two.
0
 
rinfoCommented:
maybe you need to use multibyte string function to include unicode char in your routines.
refer to this
http://php.net/manual/en/ref.mbstring.php
0
 
peps03Author Commented:
Thanks.

Can this function spot if a character consists of multiple bytes? And maybe count all the bytes of the special characters? That would actually be all i need.

Could you give an example of that?

Thanks!
0
 
rinfoCommented:
Well first you have to enable php_mbstring.dll in php.ini
After that you may try this code .
Important here is to mention encoding you are using,
mb_internal_encoding("UTF-8");  //set encoding here
  $string=  "Remove a counter from the row ö ë & specified by key at";
  $stringLen = mb_strlen($string) ; //total length of the string
  
  $stringPart1 = mb_substr($string,0,10) ; //get part of the string to retain = retain first 10 chars
 
  $stringPart2 = mb_substr($string,10,$stringLen); //part of the string to replace with '.'
  
  $stringPart2 = mb_ereg_replace( "/(^\s+)|(\s+$)/us", ".", $stringPart2);
  $string = $stringPart1.$stringPart2;

Open in new window

0
 
rinfoCommented:
I have tested codes .
sorry but its not working.
Result are same as you have mentioned in you post;
0
 
Ray PaseurCommented:
Please post a link to a sample document.  Thanks, ~Ray
0
 
peps03Author Commented:
I don't have a link / page, its just a simple function.

This is what i have:

<?
	$string0 = 'Föööd music &amp; DJ&#039;s';
	$string1 = 'Fööd music &amp; DJ&#039;s';


function limit_letters($string, $letter_limit){
	
	
	$string2 = preg_replace('/&(.*?);/si', '-', $string);
	$countstring = mb_strlen(utf8_decode($string2), 'UTF-8');
		
	//if(strlen($countstring) > $letter_limit){
	if($countstring > $letter_limit){
		$dots1 = '..';
		$stringfixed = substr(htmlspecialchars_decode($string),0,$letter_limit);
	}else{
		$dots1 = '';
		$stringfixed = $string;
	};
	
	$stringfixed = utf8_decode($stringfixed);
	return rtrim($stringfixed).$dots1;
}

echo '<br><br>'.limit_letters($string0, 16).'<br><br>';
echo '<br><br>'.limit_letters($string1, 16).'<br><br>';
?>

Open in new window


As you see, the difference in string0 and string1 is 1 character (ö).
But the output is a difference of 4 letters. i need this to be a difference of 1 character, as ö is only 1 character (but more bytes...)
0
 
Ray PaseurCommented:
The reason I am asking for a link to an external file containing the sample document is that the simple act of copying and posting the data may mung the multi-byte characters.  Rather than try to run a test on munged data, I would like to be able to use PHP to open the file that contains the original test data.  So please put the sample data online somewhere and post the URL of the online file here.  Thanks, ~Ray
0
 
peps03Author Commented:
http://pjpn.eu/shorten-chars/

this is the same code, but online.
0
 
peps03Author Commented:
Is there a method to count specified characters in a string?

Say i can count all the spaces, small and capital letters and numbers in a string, later i can subtract this amount from the total length of characters in the string. Now i will know the amount of special characters in the string.

This is what is what i was thinking:

Say you want to only echo the first 10 chars of a string. The string = 'Fööd Jazzz and drinks' (= Fööd Jazzz and drinks)

The first ten chars would echo: 'Fööd Jaz' instead of 'Fööd Jazzz' because of the special chars count for 2.
These would be 10 chars counted: 'Fööd Jaz'

So only 6 letters, and spaces are counted. This subsequently leaves 4 special chars. Knowing each 2 special chars is 1 'normal' char, the output limit of 10 should be increased to 12 to output the desired text.

So is it possible to count pre-specified characters in a given string somehow?
0
 
Ray PaseurCommented:
I think you should post a new question for this.  It's been two weeks since the original question, which was answered with a tested-and-working code example.

Best regards, ~Ray
0
 
peps03Author Commented:
I've requested that this question be closed as follows:

Accepted answer: 0 points for peps03's comment #a38904904

for the following reason:

Thanks Ray!
0
 
Ray PaseurCommented:
I believe the author accidentally posted a close request instead of accepting the answer, which is accompanied with a tested and working code example at this URL.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28020210.html#a38870936

If that's wrong and I misunderstood the question or the close request, I'd like a chance to find out what the issues were.  Thanks, ~Ray
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.