• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 256
  • Last Modified:

PHP: preg_match pattern (or other strategy) needed

i want to count the number of "words" that are longer than MaxChars in length,
where a "word" is anything between punctuation/whitespace/specialchars (i.e., non alphanumeric) and \n, CRLF (etc) counts as whitespace
I'm not fussy about > vs. >=  for the MaxChars comparison (i.e., off by 1), however the algorithm lays out most easily/efficiently.

$NumberOfLargeWordsResult =    some_pattern_call( $InputTextString, $MaxChars);

EXAMPLE VALUES
if MaxChars is = 5, then the following would be the results for test strings (assuming > is used for the count comparison for these examples, but a solution >= is fine if easier).
Icing on the cake would be if foreign/extended char sets could count as alphanumerics, but happy with a basic algorithm.



Result      InputTextString
------------------------------------------------
    2          a ab abc abcd abcde abcdef abcdefg
    1          JNwhCSChhrNrGXH
    3          HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA
    0          123

I'm hoping it can be an efficient preg pattern match, but am open to any implementation suggestions.

MANY THANKS!
0
willsherwood
Asked:
willsherwood
2 Solutions
 
Ray PaseurCommented:
See http://www.laprbass.com/RAY_temp_willsherwood.php
<?php // RAY_email_validation.php
error_reporting(E_ALL);
echo "<pre>";


// TEST DATA FROM THE POST AT EE
$arr = array
(   '2' =>         'a ab abc abcd abcde abcdef abcdefg'
,   '1' =>         'JNwhCSChhrNrGXH'
,   '3' =>         'HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA'
,   '0' =>         '123'
)
;

// A FUNCTION TO EXTRACT / COUNT LONG WORDS
function findLongWords($str, $len=5, $ret='COUNT')
{
    // STORE VOCABULARY HERE
    $lon = array();

    // BREAK THE STRING AND TEST THE WORDS
    $wds = preg_split("/[\b[:punct:]\s]+/", $str);
    foreach ($wds as $sub)
    {
        // IS LENGTH GREATER THAN THE LIMIT?
        if (strlen($sub) > $len)
        {
            $lon[] = $sub;
        }
    }

    // RETURN COUNT OR VOCABULARY
    if (strtoupper(substr($ret,0,1)) == 'V') return $lon;
    return count($lon);
}

// TEST THE FUNCTION
foreach ($arr as $num => $txt)
{
    $cnt = findLongWords($txt);
    echo PHP_EOL . "FINDING $cnt EXPECTING $num WITH $txt";
}

Open in new window

0
 
Terry WoodsIT GuruCommented:
You should be able to use a preg_match_all like this:

$maxChars = 5;
$test_values = array ('a ab abc abcd abcde abcdef abcdefg',
                      'JNwhCSChhrNrGXH',
                      'HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA',
                      '123');

foreach ($test_values as $value) {
  print "value: $value<br>\n";
  print preg_match_all("/\b[a-z\d]{".($maxChars+1).",}/i", $value, $matches)."\n";
}

Open in new window

Result:
value: a ab abc abcd abcde abcdef abcdefg<br>
2
value: JNwhCSChhrNrGXH<br>
1
value: HicEEBrWq4CGEizFnxr;the+5GsjSqBtEojSOB\n PknsRLmydhYKHVKA<br>
3
value: 123<br>
0

Open in new window

0
 
willsherwoodAuthor Commented:
excellent, thanks all!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now