Link to home
Start Free TrialLog in
Avatar of paries
paries

asked on

regular expression optional whitespace

Hi experts,

From the following block of text, i am trying to retrieve 9ismyanswer using a regular expression:

rooms </strong>&nbsp;<i>9ismyanswer</i></td>

Open in new window


 The part that I am having trouble with is I want to still get a match if there is any number of spaces (including 0) between rooms and </strong>

Here is what I've got so far, and it works as long as there is exactly one space after rooms:

pattern:
(?<=rooms </strong>&nbsp;<i>)([\s\S]*?)(?=</i></td>)

Open in new window


I've tried the following, but neither have worked.  Basically, I'm looking for a wildcard for any number of whitespaces.

(?<=rooms[\s]*</strong>&nbsp;<i>)([\s\S]*?)(?=</i></td>)
(?<=rooms\s*</strong>&nbsp;<i>)([\s\S]*?)(?=</i></td>)

Open in new window


Thanks for your help
Avatar of ozo
ozo
Flag of United States of America image

\s* should have worked.  what was the text that failed?
Will this not work?

/.*<i>([^<]*).*/$1/
Your reg exp didn't work because variable length look-behind is not yet implemented in PHP regex engine. Try this one:

(?<=<i>)[\w]+(?=</i></td>)

preg_match("/(?<=<i>)[\w]+(?=<\/i><\/td>)/", "rooms </strong>&nbsp;<i>9ismyanswer</i></td>");

Bye
It doesn't need to be a look-behind
(?: should suffice rather than (?<=
But he used look-behind in his regex and in that regex (?: doesn't work.
ASKER CERTIFIED SOLUTION
Avatar of vks_vicky
vks_vicky
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Try this one too.


(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)

This would be the result

Group(0) = rooms </strong>&nbsp;<i>9ismyanswer
Group(1) = rooms
Group(2) = </strong>
Group(3) = 9ismyanswer

You can enter any number of spaces between rooms and </strong>
@vks_vicky no one of your regexp gives desired result nor in Regex Coach nor in Espresso.

@paries: you have here two valid solution:

1: the mine one posted in ID 33669193: (?<=<i>)[\w]+(?=</i></td>)

2: as suggested by ozo (?:[\w]+)(?=</i></td>) withou using look-behind

Hope this helps
@margusG, I'm using RegEx Tester a plugin for Eclipse.

Also I've tested the expression with an online tester

http://www.regexplanet.com/simple/index.html.

Please check with them I get results, I'll try to check with Regex Coach or Espresso and let you know
As the thread of comments illustrates, REGEX is complicated and often hard to get right.  Tangentially related: There are literally thousands of REGEX patterns published on the WWW that purport to validate an email address, and almost all of them are wrong in one way or another.  The take-away message is that REGEX can be a powerful tool for good, or for wasting your debugging time!

Try running this little script.
<?php // RAY_temp_paries.php
error_reporting(E_ALL);

// TEST DATA FROM THE POST AT EE
$str = 'rooms </strong>&nbsp;<i>9ismyanswer</i></td>';

// PROCESS THE TEST DATA
echo pluck($str, 'i');

// A FUNCTION TO PLUCK OUT THE INFORMATION BETWEEN TAGS
function pluck($string, $tag, $case_sensitive=FALSE)
{
    // FORMAT THE SEARCH ARGUMENTS
    $open_tag = '<'  . $tag . '>';
    $clos_tag = '</' . $tag . '>';

    // COPY THE ORIGINAL STRING
    $str = $string;

    // IF CASE-INSENSITIVE SEARCH
	if (!$case_sensitive)
	{
	    $str      = strtoupper($string);
	    $open_tag = strtoupper($open_tag);
	    $clos_tag = strtoupper($clos_tag);
	}

    // FIND THE LOCATIONS OR RETURN FALSE IF NOT PARSABLE
    $a = strpos($str, $open_tag);
    if ($a === FALSE) return FALSE;
    $z = strpos($str, $clos_tag);
    if ($z === FALSE) return FALSE;

    // RETURN THE DATA FROM THE ORIGINAL STRING
    return substr($string, $a+strlen($open_tag), $z);
}

Open in new window

I'm very sorry, vks, I don't want to pick your solutions, but I tested your two regexp at regexplanet and result is Matches = No. With regex Tester (plugin for firefox) it says that is not a valid reg exp. Maybe I miss something, but really don't know where I'm wrong with these tests...
@marqusG, I'm not sure what you are doing and how you are trying it out.

I'm attaching a screenshot of regexplanet.

And I'm just trying to help. Its ur choice whether u use my solution or not!!
Screen-shot-2010-09-14-at-4.37.2.png
I pray you to excuse me. Really I had not seen (for my inattention) group 0, group 1 and so on to the right!!! I had only seen matches() No (I don't uderstand so well what they mean with this).
I didn't wanto to drive you mad: your solution wroks fine as the others.

Best
You can read more about it @

http://www.regular-expressions.info/brackets.html

The section "Backtracking Into Capturing Groups" & "Backreferences to Failed Groups", tells you why the match was "No" and the groups are available.
@vks: thanks for links. But still have a question for you: how you use grouping in PHP? I have used this
[code]<?php
$str = "rooms </strong>&nbsp;<i>9ismyanswer</i></td>";
$regex = "(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)";
preg_match_all("/(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)/", $str, $matches);
echo "<pre>";
var_dump($matches);
echo "</pre>";
?>[/code]

But result is NULL
@marqusG

You are using forward slash as your delimiter, but not escaping the ones you are using in your pattern. Try changing your delimiter or escaping your internal forward slashes:
<?php
	$str = "rooms </strong>&nbsp;<i>9ismyanswer</i></td>";
	$regex = "(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)";
	preg_match_all("#(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)#", $str, $matches);
	echo "<pre>";
	var_dump($matches);
	echo "</pre>";
?>

Open in new window

untitled.JPG
Sometimes I feel stupid...:-(
I like to think we're all here to learn  :D
Avatar of paries
paries

ASKER

Thanks for the help everybody.vks_vicky's did exactly what I was looking for.  I learned quite a bit from the discussion too.