• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 261
  • Last Modified:

PHP: preg_match pattern (or other strategy) needed

i want to count the number of "words" that are longer than MaxChars in length,
where a "word" is anything between punctuation/whitespace/specialchars (i.e., non alphanumeric) and \n, CRLF (etc) counts as whitespace
I'm not fussy about > vs. >=  for the MaxChars comparison (i.e., off by 1), however the algorithm lays out most easily/efficiently.

$NumberOfLargeWordsResult =    some_pattern_call( $InputTextString, $MaxChars);

EXAMPLE VALUES
if MaxChars is = 5, then the following would be the results for test strings (assuming > is used for the count comparison for these examples, but a solution >= is fine if easier).
Icing on the cake would be if foreign/extended char sets could count as alphanumerics, but happy with a basic algorithm.



Result      InputTextString
------------------------------------------------
    2          a ab abc abcd abcde abcdef abcdefg
    1          JNwhCSChhrNrGXH
    3          HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA
    0          123

I'm hoping it can be an efficient preg pattern match, but am open to any implementation suggestions.

MANY THANKS!
0
willsherwood
Asked:
willsherwood
2 Solutions
 
Ray PaseurCommented:
See http://www.laprbass.com/RAY_temp_willsherwood.php
<?php // RAY_email_validation.php
error_reporting(E_ALL);
echo "<pre>";


// TEST DATA FROM THE POST AT EE
$arr = array
(   '2' =>         'a ab abc abcd abcde abcdef abcdefg'
,   '1' =>         'JNwhCSChhrNrGXH'
,   '3' =>         'HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA'
,   '0' =>         '123'
)
;

// A FUNCTION TO EXTRACT / COUNT LONG WORDS
function findLongWords($str, $len=5, $ret='COUNT')
{
    // STORE VOCABULARY HERE
    $lon = array();

    // BREAK THE STRING AND TEST THE WORDS
    $wds = preg_split("/[\b[:punct:]\s]+/", $str);
    foreach ($wds as $sub)
    {
        // IS LENGTH GREATER THAN THE LIMIT?
        if (strlen($sub) > $len)
        {
            $lon[] = $sub;
        }
    }

    // RETURN COUNT OR VOCABULARY
    if (strtoupper(substr($ret,0,1)) == 'V') return $lon;
    return count($lon);
}

// TEST THE FUNCTION
foreach ($arr as $num => $txt)
{
    $cnt = findLongWords($txt);
    echo PHP_EOL . "FINDING $cnt EXPECTING $num WITH $txt";
}

Open in new window

0
 
Terry WoodsIT GuruCommented:
You should be able to use a preg_match_all like this:

$maxChars = 5;
$test_values = array ('a ab abc abcd abcde abcdef abcdefg',
                      'JNwhCSChhrNrGXH',
                      'HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA',
                      '123');

foreach ($test_values as $value) {
  print "value: $value<br>\n";
  print preg_match_all("/\b[a-z\d]{".($maxChars+1).",}/i", $value, $matches)."\n";
}

Open in new window

Result:
value: a ab abc abcd abcde abcdef abcdefg<br>
2
value: JNwhCSChhrNrGXH<br>
1
value: HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA<br>
3
value: 123<br>
0

Open in new window

0
 
willsherwoodAuthor Commented:
excellent, thanks all!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now