Link to home
Start Free TrialLog in
Avatar of RaineyM
RaineyM

asked on

Regular Expression Help

I am retrieving the following string from a MySQL database and need a regular expression to detect the [code]...[/code] parts. No matter what expression I use, it won't detect it.

This is a test.\n[code]\n<p>Hello</p>\n<p>World</p>\n[/code]\nThis is a test.\n
Avatar of manav_mathur
manav_mathur

Raineym,
use strict;
use warnings;
my $var="This is a test.\n[code]\n<p>Hello</p>\n<p>World</p>\n[/code]\nThis is a test.\n" ;
$var =~ s/^.*(\[code\].*\[\/code\]).*$/$1/si;
print $var

Manav
ASKER CERTIFIED SOLUTION
Avatar of manav_mathur
manav_mathur

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
my($Test) = "\n[code]\n<p>Hello</p>\n<p>World</p>\n[/code]\nThis is a test.[code]\n<p>Part 2</p>\n[/code]\n";
my(@Parts) = undef;
while ($Test =~ m!\[code\](.*?)\[/code\]!gis)
{
      push @Parts, $1;      
};
print "There are ", scalar $#Parts, " parts detected...\n";
foreach (@Parts)
{
      print "\n$_";
};
Raineym,

the important thing in all these solutions is the use of /s

s - enables multiline match

Manav
Avatar of RaineyM

ASKER

Thank you Manav.

I can't believe I forgot /s. Doh!
Happens ;)

Manav
Avatar of RaineyM

ASKER

Ok, now when there is more than one [code]...[/code] in the string, it detects everything between the first occurance of [code] and the last occurance of [/code].
No no

then use

use strict;
use warnings;
my $var="This is a test.\n[code]\n<p>Hello</p>\n<p>World</p>\n[/code]\nThis is a test.\n[code]\nHi second\n[/code]\nEnd\n" ;
while ($var =~ m/\[code\](.*?)\[\/code\]/gsi) {
print "detected" ;
print "$1" ;
}

Notice the .*? instead of the .* in  the regex.

A regex by default matches the largest string it can find. a ? before the regex overrides this default behaviour.
In your problem, hence you wnat to match the smallest string between two [code] and [\code]. Hence this will work

also remember to put the g(for global match).

Manav
And if you want to have [code] and [\code] in $1 too, use

use strict;
use warnings;
my $var="This is a test.\n[code]\n<p>Hello</p>\n<p>World</p>\n[/code]\nThis is a test.\n[code]\nHi second\n[/code]\nEnd\n" ;
while ($var =~ m/(\[code\].*?\[\/code\])/gsi) {
print "detected\n" ;
print "$1" ;
}


Manav
Avatar of RaineyM

ASKER

That I did no know. ** Gray Matter Expanding **

Any way to give bonus points? You deserve it.
Raineym,

You could put a request to community support. Or post another dummy question, then I can post a stupid answer to the dummy question and you award me the points. ;)

Actually, most of the ppl dont know this maximum matching property of regex. Hence, most of them dont know the use of a ? (apart from when used after a metacharacter)

.? -> matches 0 or 1 character
.*? -> matches the minimum number of any characters. This by itslef dosnt make much if a sense unless it is surronded by ohter fragments of regex on either side or both sides.

Manav
Avatar of RaineyM

ASKER

OK, look for the question "What is the maximum matching property of regex?".