Perl negative lookbehind assertion problem

Hi Experts,

Why does the code below, print all 3 lines of input data?  I'm trying to print all lines which contain 'type="song"' not preceded by '---'.
foreach (<DATA>)
{ print if /(?<!---).+type="song"/ }

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>

Open in new window

I note that if I change the RE to be a positive lookbehind assertion:
    { print if /(?<=---).+type="song"/ }
it matches just this one line, as I would have expected.
    <slide_group name="--- This should NOT print" type="song"/>

What's going on, guys?  It this a RE conspiracy or what?
LVL 12
tel2Asked:
Who is Participating?
 
Terry WoodsConnect With a Mentor IT GuruCommented:
Try this maybe?:
{ print if /^(?!.*---).+type="song"/ }
0
 
Terry WoodsIT GuruCommented:
The pattern:
/(?<!---).+type="song"/
matches:
  <slide_group name="--- This should NOT print" type="song"/>
as soon as the regex engine looks at the T in This, because at that point the previous 3 characters are "-- ", not "---"

Make sense?
0
 
Terry WoodsIT GuruCommented:
Try printing if this *doesn't* match:
/---.+type="song"/
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
Terry WoodsIT GuruCommented:
I think it would be coded like this, though there might be a more efficient way?

foreach (<DATA>)
{ print if ($_ !~ /---.+type="song"/)  }
0
 
tel2Author Commented:
Thanks Terry.

Re your 1st post, no, I don't get it, sorry.  I thought my negative lookbehind assertion is saying not to match if there are any '---'s before 'type="song"'.  No?  How can I say that with a negative lookbehind assertion?

Re your 2nd post, although that will work for the test data I've supplied, it doesn't meet the criteria I've specified (i.e. "print all lines which contain 'type="song"' not preceded by '---'"), so it will print a line like this:
    <slide_group name="And this also should NOT print" type="OTHER"/>
Sorry - I should have provided that in the test data.

Re your 3rd post, yes you suggestion can be coded like that, or like this:
    { print if !/---.+type="song"/ }
or this:
    { print unless /---.+type="song"/ }

I know of ways to get what I want, but right now I'm trying to understand how to get my negative lookbehind assertion to work, and whether RE's have consipired against me to slow me down.
0
 
Terry WoodsIT GuruCommented:
The reason is that the regex engine doesn't give up after looking at the 3 dashes and failing to match. It keeps looking for another way to match.

In fact, for the line:
  <slide_group name="--- This should NOT print" type="song"/>
The regex engine would first look at the first position in the string (before the first space character), and see it wasn't preceded by "---". It would then try to match the remaining part of expression (being /.+type="song"/) which would match, and it would instantly confirm the match. This is even earlier than what I tried to describe in my first post.
0
 
Terry WoodsIT GuruCommented:
Actually, that would still exclude cases where the --- is *after the text: type="song"

You can remedy that like this:
{ print if /^(?!((?!type="song").)*---).+type="song"/ }
0
 
Terry WoodsIT GuruCommented:
To explain what the pattern does, it looks ahead (but no further than type="song") and if it finds the ---, then it fails to match. The start-of-line placeholder ^ means it must find the --- if it's present before the type="song")
0
 
tel2Author Commented:
Thanks for that, Terry.

Let's see if someone comes up with a simpler RE.
0
 
ozoConnect With a Mentor Commented:
foreach (<DATA>){
    print if /^(?!.*---).+type="song"/;
    print if /^((?!---).)+type="song"/;
    print if /^(?!.*---.+type="song").+type="song"/
}

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>
  <slide_group name="And should this print" type="song" --- />


I thought /(?<!---).+type="song"/ is saying not to match if there are any '---'s before 'type="song"'.
No, it is saying not to match if there is a --- before .+type="song"
.+type="song" can match
  <slide_group name="--- This should NOT print" type="song"
and there is no --- before
  <slide_group name="--- This should NOT print" type="song"
0
 
tel2Author Commented:
Much appreciated, Terry and ozo.
I'm learning.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.