Link to home
Start Free TrialLog in
Avatar of Slimshaneey
SlimshaneeyFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Perl Regex to extract word pairs and tuples?

Hi all, I need some help in fixing my regex which clearly doesnt work.

Take the following string:
"This is a string >>>>> And another string"

I want to  be able to take word pairs (I have another method that needs triple word groups) so that I ended up with matches like:

This is
is a
a string
And another
another string

For the triple then,
This is a
is a string
And another string

Any ideas how I do that?  I want to avoid using loops to process this though!
Avatar of ozo
ozo
Flag of United States of America image

$_="This is a string >>>>> And another string";
print "$1 $2\n" while /(\w+)\W+(?=(\w+))/g;
print "$1 $2 $3\n" while /(\w+)(?=\W+(\w+)\W+(\w+))/g;
#avoiding explicit loops
print map{$_||"\n"} /(\w+\s)\W*(?=(\w+)())/g;
Avatar of Slimshaneey

ASKER

Ozo - That doesnt seem to work for me. I keep ending up with 3 separate arrays containing only single word results when I use preg_match_all in PHP with that pattern?
Those were Perl statements, and no arrays were involved.
What were you doing in PHP to create arrays?

#!/usr/bin/perl
#avoiding loops
$_="This is a string >>>>> And another string";
s/(\w+)\W+(?=(\w+))/$1 $2\n/g;
print;

$_="This is a string >>>>> And another string";
s/(\w+)\W+(?=(\w+)\W+(\w+))/$1 $2 $3\n/g;
print;
Sorry, I just noticed that you don't want

string And
or
a string And
or
string And another

In that case, I might create arrays in Perl with

$_="This is a string >>>>> And another string";
@pairs = /(?=(\w+\s+\w+))\w+/g;
@triples = /(?=(\w+\s+\w+\s+\w+))\w+/g;

but I don't know how you want to create arrays in PHP.
ASKER CERTIFIED SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I want to avoid using loops
Why?  It's probably easier that way, and it may be faster than using the REGEX engine.  If you're doing this only a few thousand times, it's not worth studying, but if it's a frequent algorithm it might be worthy of investigation instead of just assuming that the loop would take longer.

See http://www.laprbass.com/RAY_temp_slimshaney.php
<?php // RAY_temp_slimshaney.php
error_reporting(E_ALL);
echo "<pre>";

// COPIED / MODIFIED FROM THE POST AT EE
$str = "This is a string And another string Ding";

// PROCESS THE STRING AS AN ARRAY
$arr = explode(' ', $str);

// IF THERE IS STILL DATA IN THE ARRAY
while ($arr)
{
    // TAKE THE FIRST ELEMENT OFF THE ARRAY
    $sub   = array_shift($arr);

    // CONCATENATE THE NEXT ELEMENT
    $sub  .= ' ' . current($arr);

    // SAVE THE WORD PAIR
    $out[] = $sub;
}

// DISCARD THE LAST ELEMENT
array_pop($out);

// SHOW THE INPUT AND THE WORK PRODUCT
var_dump($str);
var_dump($out);

Open in new window

I think a similar pattern could find triple-word groups, too.  It's not clear from the three-word example in the original post what the expected output should really be.

HTH, ~Ray
This worked exactly as reqiured, I'd give bonus marks for the neatness of the single regex for 2 and 3 word combos if I could! Many thanks
Shane