Solved

regular expression optional whitespace

Posted on 2010-09-13
19
672 Views
Last Modified: 2012-05-10
Hi experts,

From the following block of text, i am trying to retrieve 9ismyanswer using a regular expression:

rooms </strong>&nbsp;<i>9ismyanswer</i></td>

Open in new window


 The part that I am having trouble with is I want to still get a match if there is any number of spaces (including 0) between rooms and </strong>

Here is what I've got so far, and it works as long as there is exactly one space after rooms:

pattern:
(?<=rooms </strong>&nbsp;<i>)([\s\S]*?)(?=</i></td>)

Open in new window


I've tried the following, but neither have worked.  Basically, I'm looking for a wildcard for any number of whitespaces.

(?<=rooms[\s]*</strong>&nbsp;<i>)([\s\S]*?)(?=</i></td>)
(?<=rooms\s*</strong>&nbsp;<i>)([\s\S]*?)(?=</i></td>)

Open in new window


Thanks for your help
0
Comment
Question by:paries
  • 7
  • 5
  • 2
  • +4
19 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 33669040
\s* should have worked.  what was the text that failed?
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 33669155
Will this not work?

/.*<i>([^<]*).*/$1/
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33669193
Your reg exp didn't work because variable length look-behind is not yet implemented in PHP regex engine. Try this one:

(?<=<i>)[\w]+(?=</i></td>)

preg_match("/(?<=<i>)[\w]+(?=<\/i><\/td>)/", "rooms </strong>&nbsp;<i>9ismyanswer</i></td>");

Bye
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 
LVL 84

Expert Comment

by:ozo
ID: 33669203
It doesn't need to be a look-behind
(?: should suffice rather than (?<=
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33669231
But he used look-behind in his regex and in that regex (?: doesn't work.
0
 
LVL 5

Accepted Solution

by:
vks_vicky earned 500 total points
ID: 33669821
You could try this

(?<=rooms\s{2}</strong>&nbsp;<i>)([\s\S]*?)(?=</i></td>)

Where {2} is the number of white spaces after rooms, but you cannot give {*} because its a Illegal repetition, hope this helps.
0
 
LVL 5

Expert Comment

by:vks_vicky
ID: 33669879
Try this one too.


(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)

This would be the result

Group(0) = rooms </strong>&nbsp;<i>9ismyanswer
Group(1) = rooms
Group(2) = </strong>
Group(3) = 9ismyanswer

You can enter any number of spaces between rooms and </strong>
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33670605
@vks_vicky no one of your regexp gives desired result nor in Regex Coach nor in Espresso.

@paries: you have here two valid solution:

1: the mine one posted in ID 33669193: (?<=<i>)[\w]+(?=</i></td>)

2: as suggested by ozo (?:[\w]+)(?=</i></td>) withou using look-behind

Hope this helps
0
 
LVL 5

Expert Comment

by:vks_vicky
ID: 33670829
@margusG, I'm using RegEx Tester a plugin for Eclipse.

Also I've tested the expression with an online tester

http://www.regexplanet.com/simple/index.html.

Please check with them I get results, I'll try to check with Regex Coach or Espresso and let you know
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 33670946
As the thread of comments illustrates, REGEX is complicated and often hard to get right.  Tangentially related: There are literally thousands of REGEX patterns published on the WWW that purport to validate an email address, and almost all of them are wrong in one way or another.  The take-away message is that REGEX can be a powerful tool for good, or for wasting your debugging time!

Try running this little script.
<?php // RAY_temp_paries.php
error_reporting(E_ALL);

// TEST DATA FROM THE POST AT EE
$str = 'rooms </strong>&nbsp;<i>9ismyanswer</i></td>';

// PROCESS THE TEST DATA
echo pluck($str, 'i');

// A FUNCTION TO PLUCK OUT THE INFORMATION BETWEEN TAGS
function pluck($string, $tag, $case_sensitive=FALSE)
{
    // FORMAT THE SEARCH ARGUMENTS
    $open_tag = '<'  . $tag . '>';
    $clos_tag = '</' . $tag . '>';

    // COPY THE ORIGINAL STRING
    $str = $string;

    // IF CASE-INSENSITIVE SEARCH
	if (!$case_sensitive)
	{
	    $str      = strtoupper($string);
	    $open_tag = strtoupper($open_tag);
	    $clos_tag = strtoupper($clos_tag);
	}

    // FIND THE LOCATIONS OR RETURN FALSE IF NOT PARSABLE
    $a = strpos($str, $open_tag);
    if ($a === FALSE) return FALSE;
    $z = strpos($str, $clos_tag);
    if ($z === FALSE) return FALSE;

    // RETURN THE DATA FROM THE ORIGINAL STRING
    return substr($string, $a+strlen($open_tag), $z);
}

Open in new window

0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33670958
I'm very sorry, vks, I don't want to pick your solutions, but I tested your two regexp at regexplanet and result is Matches = No. With regex Tester (plugin for firefox) it says that is not a valid reg exp. Maybe I miss something, but really don't know where I'm wrong with these tests...
0
 
LVL 5

Expert Comment

by:vks_vicky
ID: 33670980
@marqusG, I'm not sure what you are doing and how you are trying it out.

I'm attaching a screenshot of regexplanet.

And I'm just trying to help. Its ur choice whether u use my solution or not!!
Screen-shot-2010-09-14-at-4.37.2.png
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33671057
I pray you to excuse me. Really I had not seen (for my inattention) group 0, group 1 and so on to the right!!! I had only seen matches() No (I don't uderstand so well what they mean with this).
I didn't wanto to drive you mad: your solution wroks fine as the others.

Best
0
 
LVL 5

Expert Comment

by:vks_vicky
ID: 33671207
You can read more about it @

http://www.regular-expressions.info/brackets.html

The section "Backtracking Into Capturing Groups" & "Backreferences to Failed Groups", tells you why the match was "No" and the groups are available.
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33671979
@vks: thanks for links. But still have a question for you: how you use grouping in PHP? I have used this
[code]<?php
$str = "rooms </strong>&nbsp;<i>9ismyanswer</i></td>";
$regex = "(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)";
preg_match_all("/(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)/", $str, $matches);
echo "<pre>";
var_dump($matches);
echo "</pre>";
?>[/code]

But result is NULL
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 33672178
@marqusG

You are using forward slash as your delimiter, but not escaping the ones you are using in your pattern. Try changing your delimiter or escaping your internal forward slashes:
<?php
	$str = "rooms </strong>&nbsp;<i>9ismyanswer</i></td>";
	$regex = "(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)";
	preg_match_all("#(rooms)[\s]*(</strong>)&nbsp;<i>([\s\S]*?)(?=</i></td>)#", $str, $matches);
	echo "<pre>";
	var_dump($matches);
	echo "</pre>";
?>

Open in new window

untitled.JPG
0
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33672199
Sometimes I feel stupid...:-(
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 33672205
I like to think we're all here to learn  :D
0
 

Author Closing Comment

by:paries
ID: 33678554
Thanks for the help everybody.vks_vicky's did exactly what I was looking for.  I learned quite a bit from the discussion too.
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question