Solved

Perl negative lookbehind assertion problem

Posted on 2011-09-18
11
344 Views
Last Modified: 2012-05-12
Hi Experts,

Why does the code below, print all 3 lines of input data?  I'm trying to print all lines which contain 'type="song"' not preceded by '---'.
foreach (<DATA>)
{ print if /(?<!---).+type="song"/ }

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>

Open in new window

I note that if I change the RE to be a positive lookbehind assertion:
    { print if /(?<=---).+type="song"/ }
it matches just this one line, as I would have expected.
    <slide_group name="--- This should NOT print" type="song"/>

What's going on, guys?  It this a RE conspiracy or what?
0
Comment
Question by:tel2
  • 7
  • 3
11 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557954
The pattern:
/(?<!---).+type="song"/
matches:
  <slide_group name="--- This should NOT print" type="song"/>
as soon as the regex engine looks at the T in This, because at that point the previous 3 characters are "-- ", not "---"

Make sense?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557960
Try printing if this *doesn't* match:
/---.+type="song"/
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557976
I think it would be coded like this, though there might be a more efficient way?

foreach (<DATA>)
{ print if ($_ !~ /---.+type="song"/)  }
0
 
LVL 11

Author Comment

by:tel2
ID: 36557996
Thanks Terry.

Re your 1st post, no, I don't get it, sorry.  I thought my negative lookbehind assertion is saying not to match if there are any '---'s before 'type="song"'.  No?  How can I say that with a negative lookbehind assertion?

Re your 2nd post, although that will work for the test data I've supplied, it doesn't meet the criteria I've specified (i.e. "print all lines which contain 'type="song"' not preceded by '---'"), so it will print a line like this:
    <slide_group name="And this also should NOT print" type="OTHER"/>
Sorry - I should have provided that in the test data.

Re your 3rd post, yes you suggestion can be coded like that, or like this:
    { print if !/---.+type="song"/ }
or this:
    { print unless /---.+type="song"/ }

I know of ways to get what I want, but right now I'm trying to understand how to get my negative lookbehind assertion to work, and whether RE's have consipired against me to slow me down.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558009
The reason is that the regex engine doesn't give up after looking at the 3 dashes and failing to match. It keeps looking for another way to match.

In fact, for the line:
  <slide_group name="--- This should NOT print" type="song"/>
The regex engine would first look at the first position in the string (before the first space character), and see it wasn't preceded by "---". It would then try to match the remaining part of expression (being /.+type="song"/) which would match, and it would instantly confirm the match. This is even earlier than what I tried to describe in my first post.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 35

Accepted Solution

by:
Terry Woods earned 250 total points
ID: 36558011
Try this maybe?:
{ print if /^(?!.*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558013
Actually, that would still exclude cases where the --- is *after the text: type="song"

You can remedy that like this:
{ print if /^(?!((?!type="song").)*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558019
To explain what the pattern does, it looks ahead (but no further than type="song") and if it finds the ---, then it fails to match. The start-of-line placeholder ^ means it must find the --- if it's present before the type="song")
0
 
LVL 11

Author Comment

by:tel2
ID: 36558075
Thanks for that, Terry.

Let's see if someone comes up with a simpler RE.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 250 total points
ID: 36558411
foreach (<DATA>){
    print if /^(?!.*---).+type="song"/;
    print if /^((?!---).)+type="song"/;
    print if /^(?!.*---.+type="song").+type="song"/
}

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>
  <slide_group name="And should this print" type="song" --- />


I thought /(?<!---).+type="song"/ is saying not to match if there are any '---'s before 'type="song"'.
No, it is saying not to match if there is a --- before .+type="song"
.+type="song" can match
  <slide_group name="--- This should NOT print" type="song"
and there is no --- before
  <slide_group name="--- This should NOT print" type="song"
0
 
LVL 11

Author Closing Comment

by:tel2
ID: 36565835
Much appreciated, Terry and ozo.
I'm learning.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now