Solved

PHP Function to break sentence after X characters and add dots...

Posted on 2013-02-05
22
734 Views
Last Modified: 2013-02-23
Hi,

I'm struggling with a function to break a sentence after X characters. What i had worked oke, until the broken word contains special characters like:
ö ë & "

If the X character was one of the UTF-8 chars of the special char, strange chars would be shown.

How to build a function that also prevents breaking the output of special chars.

Thanks!!
0
Comment
Question by:peps03
  • 8
  • 7
  • 3
  • +2
22 Comments
 
LVL 12

Expert Comment

by:sivagnanam chandrakanth
Comment Utility
Try this

<?php
$string= "Remove a counter from the row ö ë & specified by key at the granularity specified by column_path. Note that all the values in column_path besides column_path.column_family are truly optional: you can remove the entire row by just specifying the ColumnFamily, or you can remove a SuperColumn or a single Column by specifying those levels too. Note that counters have limited support for deletes: if you remove a counter, you must wait to issue any following update until the delete has reached all the nodes and all of them have been fully compacted.";
$parts = str_split($string, $split_length = 10);
echo "<pre>";
print_r($parts);

?>

Open in new window

0
 

Author Comment

by:peps03
Comment Utility
Thx. But what i actually meant is a function in which i can enter a string, and an amount of characters, say 20, like:

$input = 'hello there, i am trying to get this working';
echo shorten($input, 20);
Output:
Hello there, i am tr..

But it should also work if the 20th character is a special char like: ë or &amp;
0
 
LVL 12

Expert Comment

by:sivagnanam chandrakanth
Comment Utility
ok..Try this

<?php
$string= "Removö ë & counter from the row ö ë & specified by key at the granularity specified by column_path. Note that all the values in column_path besides column_path.column_family are truly optional: you can remove the entire row by just specifying the ColumnFamily, or you can remove a SuperColumn or a single Column by specifying those levels too. Note that counters have limited support for deletes: if you remove a counter, you must wait to issue any following update until the delete has reached all the nodes and all of them have been fully compacted.";

echo shorten($string,0,10);

function shorten($str,$start,$len){

return substr($str,$start,$len);

}
?>

Open in new window

0
 

Author Comment

by:peps03
Comment Utility
Thanks.

I mean when this happens:

$string= "Frlkrrtï"; // = ï

echo shorten(utf8_decode($string),0,8);

function shorten($str,$start,$len){

return substr($str,$start,$len);

}

Open in new window


change 8 to 9 and it works.
but when it is 8, it doesn't.

and ï is saved in the db like ï
0
 
LVL 27

Expert Comment

by:Lukasz Chmielewski
Comment Utility
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
What you're referring to is called a "teaser fragment" in publishing.  You may need to make multi-byte function changes here, but this works for me most of the time.
<?php // RAY_teaser_fragment.php
error_reporting(E_ALL);


// CREATE A TEASER FRAGMENT HEADLINE
// RETURN FIRST FEW WHOLE WORDS FOLLOWED BY ELLIPSES
// WITH A LINK TO THE FULL ARTICLE
// $length IS MINIMUM TRUNCATION CHARACTER COUNT


function teaser_fragment($text, $length=32, $url='#', $delim='|||')
{
    // IF TRUNCATION IS NEEDED
    if (strlen($text) > $length)
    {
        // IF TRUNCATION IS NEEDED, BREAK STRING APART
        $t = wordwrap($text, $length, $delim);
        $a = explode($delim, $t);
        $z = '...';
    }
    // IF TRUNCATION IS NOT NEEDED
    else
    {
        $a[0] = $text;
        $z = NULL;
    }

    // CONSTRUCT THE FRAGMENT WITH THE LINK AND ADD ELLIPSIS (LINK) TO THE END
    $teaser
    = '<a target="_blank" href="'
    . $url
    . '">'
    . $a[0]
    . $z
    . '</a>'
    ;
    return $teaser;
}



// USE CASES
echo "<pre>";
echo PHP_EOL;
echo "1...5...10...15...20...25...30...35...40...45..." . PHP_EOL;
echo teaser_fragment('Now is the time for all good men to come to the aid of their party');

echo PHP_EOL;
echo teaser_fragment('Now is the time for all good men to come to the aid of their party', 300);

echo PHP_EOL;
echo teaser_fragment('Now is the time for all good men to come to the aid of their party', 15, 'http://en.wikipedia.org/wiki/Filler_text');

Open in new window

HTH, ~Ray
0
 

Author Comment

by:peps03
Comment Utility
is there no function to count how many bytes a certain special character consists of?
so you can add this to the set length of the of the requested output.
(without generating longer output, but just to prevent special characters that consist of multiple bytes from getting broken)
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Have you tried strlen()?  Also, if you can post a link (please do not post the text) to a sample document we might be able to give you an example or two.
0
 
LVL 9

Expert Comment

by:rinfo
Comment Utility
maybe you need to use multibyte string function to include unicode char in your routines.
refer to this
http://php.net/manual/en/ref.mbstring.php
0
 

Author Comment

by:peps03
Comment Utility
Thanks.

Can this function spot if a character consists of multiple bytes? And maybe count all the bytes of the special characters? That would actually be all i need.

Could you give an example of that?

Thanks!
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 9

Expert Comment

by:rinfo
Comment Utility
Well first you have to enable php_mbstring.dll in php.ini
After that you may try this code .
Important here is to mention encoding you are using,
mb_internal_encoding("UTF-8");  //set encoding here
  $string=  "Remove a counter from the row ö ë & specified by key at";
  $stringLen = mb_strlen($string) ; //total length of the string
  
  $stringPart1 = mb_substr($string,0,10) ; //get part of the string to retain = retain first 10 chars
 
  $stringPart2 = mb_substr($string,10,$stringLen); //part of the string to replace with '.'
  
  $stringPart2 = mb_ereg_replace( "/(^\s+)|(\s+$)/us", ".", $stringPart2);
  $string = $stringPart1.$stringPart2;

Open in new window

0
 
LVL 9

Expert Comment

by:rinfo
Comment Utility
I have tested codes .
sorry but its not working.
Result are same as you have mentioned in you post;
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Please post a link to a sample document.  Thanks, ~Ray
0
 

Author Comment

by:peps03
Comment Utility
I don't have a link / page, its just a simple function.

This is what i have:

<?
	$string0 = 'Föööd music &amp; DJ&#039;s';
	$string1 = 'Fööd music &amp; DJ&#039;s';


function limit_letters($string, $letter_limit){
	
	
	$string2 = preg_replace('/&(.*?);/si', '-', $string);
	$countstring = mb_strlen(utf8_decode($string2), 'UTF-8');
		
	//if(strlen($countstring) > $letter_limit){
	if($countstring > $letter_limit){
		$dots1 = '..';
		$stringfixed = substr(htmlspecialchars_decode($string),0,$letter_limit);
	}else{
		$dots1 = '';
		$stringfixed = $string;
	};
	
	$stringfixed = utf8_decode($stringfixed);
	return rtrim($stringfixed).$dots1;
}

echo '<br><br>'.limit_letters($string0, 16).'<br><br>';
echo '<br><br>'.limit_letters($string1, 16).'<br><br>';
?>

Open in new window


As you see, the difference in string0 and string1 is 1 character (ö).
But the output is a difference of 4 letters. i need this to be a difference of 1 character, as ö is only 1 character (but more bytes...)
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
The reason I am asking for a link to an external file containing the sample document is that the simple act of copying and posting the data may mung the multi-byte characters.  Rather than try to run a test on munged data, I would like to be able to use PHP to open the file that contains the original test data.  So please put the sample data online somewhere and post the URL of the online file here.  Thanks, ~Ray
0
 

Author Comment

by:peps03
Comment Utility
http://pjpn.eu/shorten-chars/

this is the same code, but online.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
Comment Utility
This seems to test out OK.  The function is at line 15; the test cases are at the end.

<?php // RAY_temp_peps03.php
error_reporting(E_ALL);
echo '<pre>';

// MAN PAGE: http://www.joelonsoftware.com/articles/Unicode.html
// MAN PAGE: http://www.columbia.edu/kermit/utf8-t1.html
// MAN PAGE: http://www.utf-8.com/
// MAN PAGE: http://www.unicode.org/ucd/
// MAN PAGE: http://www.unicode.org/faq/line_breaking.html
// MAN PAGE: http://www.unicode.org/reports/tr14/
// MAN PAGE: http://www.php.net/manual/en/mbstring.supported-encodings.php
// MAN PAGE: http://www.php.net/manual/en/function.mb-split.php#99851

// MAKE A STRING FRAGMENT OF THE CORRECT LENGTH
function mb_teaser_fragment($str, $len, $tail=NULL)
{
    $arr = preg_split('/(?<!^)(?!$)/u', $str);
    $arr = array_slice($arr,0,$len);
    return implode(NULL, $arr) . $tail;
}

// GET THE TEST DATA SET
$url = 'http://pjpn.eu/shorten-chars/';
$raw = file_get_contents($url);

// REMOVE THE EXTRANEOUS STUFF
$raw = str_replace('<br><br><br><br>', '|', $raw);
$raw = strip_tags($raw);
$raw = str_replace('Untitled Document', NULL, $raw);
$raw = trim($raw);

// ENABLE BROWSER DISPLAY
echo '<meta charset="utf-8" />' . PHP_EOL;

// EXPLODE THE MULTI-BYTE STRING INTO AN ARRAY OF TEST STRINGS
$arr = explode('|', $raw);

// ADD A NON-MULTI-BYTE STRING
$arr[] = 'Food is delicious';

// SHOW THE COMPARISONS OF THE LENGTH WITH DIFFERENT CHARACTER SETS
mb_internal_encoding("UTF-8");
foreach ($arr as $utf)
{
    $cnt =    strlen($utf);
    $mbt = mb_strlen($utf);
    echo PHP_EOL . $utf . " StrLen=$cnt AND Mb_Strlen=$mbt";
}

// MAKE SOME TESTS
$utf = $arr[0];
echo PHP_EOL . $utf;
echo PHP_EOL . mb_teaser_fragment($utf,  1);
echo PHP_EOL . mb_teaser_fragment($utf,  2);
echo PHP_EOL . mb_teaser_fragment($utf,  3);
echo PHP_EOL . mb_teaser_fragment($utf,  4);
echo PHP_EOL . mb_teaser_fragment($utf,  5);
echo PHP_EOL . mb_teaser_fragment($utf,  6);
echo PHP_EOL . mb_teaser_fragment($utf,  7);
echo PHP_EOL . mb_teaser_fragment($utf,  8);
echo PHP_EOL . mb_teaser_fragment($utf,  9);
echo PHP_EOL . mb_teaser_fragment($utf, 10);
echo PHP_EOL . mb_teaser_fragment($utf, 11);
echo PHP_EOL . mb_teaser_fragment($utf, 12);
echo PHP_EOL . mb_teaser_fragment($utf, 13);
echo PHP_EOL;

$utf = $arr[2];
echo PHP_EOL . $utf;
echo PHP_EOL . mb_teaser_fragment($utf,  1);
echo PHP_EOL . mb_teaser_fragment($utf,  2);
echo PHP_EOL . mb_teaser_fragment($utf,  3);
echo PHP_EOL . mb_teaser_fragment($utf,  4);
echo PHP_EOL . mb_teaser_fragment($utf,  5);
echo PHP_EOL . mb_teaser_fragment($utf,  6);
echo PHP_EOL . mb_teaser_fragment($utf,  7);
echo PHP_EOL . mb_teaser_fragment($utf,  8);
echo PHP_EOL . mb_teaser_fragment($utf,  9);
echo PHP_EOL . mb_teaser_fragment($utf, 10);
echo PHP_EOL . mb_teaser_fragment($utf, 11);
echo PHP_EOL . mb_teaser_fragment($utf, 12);
echo PHP_EOL . mb_teaser_fragment($utf, 13);
echo PHP_EOL . mb_teaser_fragment($utf, 14);
echo PHP_EOL . mb_teaser_fragment($utf, 15);
echo PHP_EOL . mb_teaser_fragment($utf, 16);
echo PHP_EOL . mb_teaser_fragment($utf, 17);
echo PHP_EOL . mb_teaser_fragment($utf, 18);
echo PHP_EOL . mb_teaser_fragment($utf, 19);
echo PHP_EOL . mb_teaser_fragment($utf, 20);

Open in new window

Best regards, ~Ray
0
 

Author Comment

by:peps03
Comment Utility
Is there a method to count specified characters in a string?

Say i can count all the spaces, small and capital letters and numbers in a string, later i can subtract this amount from the total length of characters in the string. Now i will know the amount of special characters in the string.

This is what is what i was thinking:

Say you want to only echo the first 10 chars of a string. The string = 'Fööd Jazzz and drinks' (= Fööd Jazzz and drinks)

The first ten chars would echo: 'Fööd Jaz' instead of 'Fööd Jazzz' because of the special chars count for 2.
These would be 10 chars counted: 'Fööd Jaz'

So only 6 letters, and spaces are counted. This subsequently leaves 4 special chars. Knowing each 2 special chars is 1 'normal' char, the output limit of 10 should be increased to 12 to output the desired text.

So is it possible to count pre-specified characters in a given string somehow?
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I think you should post a new question for this.  It's been two weeks since the original question, which was answered with a tested-and-working code example.

Best regards, ~Ray
0
 

Author Comment

by:peps03
Comment Utility
I've requested that this question be closed as follows:

Accepted answer: 0 points for peps03's comment #a38904904

for the following reason:

Thanks Ray!
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I believe the author accidentally posted a close request instead of accepting the answer, which is accompanied with a tested and working code example at this URL.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28020210.html#a38870936

If that's wrong and I misunderstood the question or the close request, I'd like a chance to find out what the issues were.  Thanks, ~Ray
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now