Chris Andrews
asked on
php - get number of characters outside of <>
I've got a code that gets the number of characters in a post...
$num_chars = strlen(utf8_decode($conten t));
However, I'm having some trouble in that this count includes characters in html tags <img src="...">, <a href="...">, etc.
I use the character count to display a particular layout based on how long the post is, so I need the character count to be the number of characters displayed to a reader, and to not include the html code.
Any suggestions on the most efficient way to do that?
Thanks,
Chris
$num_chars = strlen(utf8_decode($conten
However, I'm having some trouble in that this count includes characters in html tags <img src="...">, <a href="...">, etc.
I use the character count to display a particular layout based on how long the post is, so I need the character count to be the number of characters displayed to a reader, and to not include the html code.
Any suggestions on the most efficient way to do that?
Thanks,
Chris
Is the data UTF-8? If so, PHP has mb_strlen() that might give a more accurate count. Strlen() assumes that a byte == a character, and that isn't true with UTF-8. Strip_tags() is probably OK, but it has its quirks and is notoriously unreliable with malformed (or even some well-formed) tags. Your exact results may be PHP-release dependent. You might want to check the notes on the online man page. If you set up a test case with some representative data I can show you how to test it.
It's not in UTF-8
utf8_decode
utf8_decode
Gary: I didn't overlook that. PHP is just getting into the 21st century with respect to multi-byte character sets, and UTF8_Decode() does not always work the way we wish it would. Please see the note here:
http://php.net/manual/en/function.utf8-decode.php#104907
Some suggest using Iconv. I don't have much experience with it.
http://php.net/manual/en/function.utf8-decode.php#104907
Some suggest using Iconv. I don't have much experience with it.
Even if the string is not decoded properly it would (should) still be the same length.
(though I may need to double check that)
edit
But granted for use elsewhere it may not be the best method.
(though I may need to double check that)
edit
But granted for use elsewhere it may not be the best method.
ASKER
Is it utf-8.... is the content from a post using wordpress 3.9.1 utf-8? I'm not sure.
I have PHP 5.3.3 on the server,
Testing...
Chris
I have PHP 5.3.3 on the server,
Testing...
Chris
is the content from a post using wordpress 3.9.1 utf-8?I'm not sure either. Some part of the answer may lie in what/whether the client copied and pasted from Word for Windows (thanks again, Obama).
Yes (or should be)
http://codex.wordpress.org/Converting_Database_Character_Sets
Why are you decoding the string to start with?
http://codex.wordpress.org/Converting_Database_Character_Sets
Why are you decoding the string to start with?
ASKER
Ok, well I feel stupid, but I found I am asking the wrong question.
It's actually this function that is adjusting the layout, based on word count, not character count:
//for getting word count in single.php
function wcount(){
ob_start();
the_content();
$content = ob_get_clean();
return sizeof(explode(" ", $content));
}
Now... I tried changing the $content to this:
$content = strip_tags(ob_get_clean()) ;
and that caused a slight drop in the word count, but is still counting a lot in the html tags as words.
It's actually this function that is adjusting the layout, based on word count, not character count:
//for getting word count in single.php
function wcount(){
ob_start();
the_content();
$content = ob_get_clean();
return sizeof(explode(" ", $content));
}
Now... I tried changing the $content to this:
$content = strip_tags(ob_get_clean())
and that caused a slight drop in the word count, but is still counting a lot in the html tags as words.
Post an example of the string.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you both very much for your help on this!
Open in new window