Solved

Perl negative lookbehind assertion problem

Posted on 2011-09-18
11
349 Views
Last Modified: 2012-05-12
Hi Experts,

Why does the code below, print all 3 lines of input data?  I'm trying to print all lines which contain 'type="song"' not preceded by '---'.
foreach (<DATA>)
{ print if /(?<!---).+type="song"/ }

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>

Open in new window

I note that if I change the RE to be a positive lookbehind assertion:
    { print if /(?<=---).+type="song"/ }
it matches just this one line, as I would have expected.
    <slide_group name="--- This should NOT print" type="song"/>

What's going on, guys?  It this a RE conspiracy or what?
0
Comment
Question by:tel2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 3
11 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557954
The pattern:
/(?<!---).+type="song"/
matches:
  <slide_group name="--- This should NOT print" type="song"/>
as soon as the regex engine looks at the T in This, because at that point the previous 3 characters are "-- ", not "---"

Make sense?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557960
Try printing if this *doesn't* match:
/---.+type="song"/
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557976
I think it would be coded like this, though there might be a more efficient way?

foreach (<DATA>)
{ print if ($_ !~ /---.+type="song"/)  }
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 12

Author Comment

by:tel2
ID: 36557996
Thanks Terry.

Re your 1st post, no, I don't get it, sorry.  I thought my negative lookbehind assertion is saying not to match if there are any '---'s before 'type="song"'.  No?  How can I say that with a negative lookbehind assertion?

Re your 2nd post, although that will work for the test data I've supplied, it doesn't meet the criteria I've specified (i.e. "print all lines which contain 'type="song"' not preceded by '---'"), so it will print a line like this:
    <slide_group name="And this also should NOT print" type="OTHER"/>
Sorry - I should have provided that in the test data.

Re your 3rd post, yes you suggestion can be coded like that, or like this:
    { print if !/---.+type="song"/ }
or this:
    { print unless /---.+type="song"/ }

I know of ways to get what I want, but right now I'm trying to understand how to get my negative lookbehind assertion to work, and whether RE's have consipired against me to slow me down.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558009
The reason is that the regex engine doesn't give up after looking at the 3 dashes and failing to match. It keeps looking for another way to match.

In fact, for the line:
  <slide_group name="--- This should NOT print" type="song"/>
The regex engine would first look at the first position in the string (before the first space character), and see it wasn't preceded by "---". It would then try to match the remaining part of expression (being /.+type="song"/) which would match, and it would instantly confirm the match. This is even earlier than what I tried to describe in my first post.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 250 total points
ID: 36558011
Try this maybe?:
{ print if /^(?!.*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558013
Actually, that would still exclude cases where the --- is *after the text: type="song"

You can remedy that like this:
{ print if /^(?!((?!type="song").)*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558019
To explain what the pattern does, it looks ahead (but no further than type="song") and if it finds the ---, then it fails to match. The start-of-line placeholder ^ means it must find the --- if it's present before the type="song")
0
 
LVL 12

Author Comment

by:tel2
ID: 36558075
Thanks for that, Terry.

Let's see if someone comes up with a simpler RE.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 250 total points
ID: 36558411
foreach (<DATA>){
    print if /^(?!.*---).+type="song"/;
    print if /^((?!---).)+type="song"/;
    print if /^(?!.*---.+type="song").+type="song"/
}

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>
  <slide_group name="And should this print" type="song" --- />


I thought /(?<!---).+type="song"/ is saying not to match if there are any '---'s before 'type="song"'.
No, it is saying not to match if there is a --- before .+type="song"
.+type="song" can match
  <slide_group name="--- This should NOT print" type="song"
and there is no --- before
  <slide_group name="--- This should NOT print" type="song"
0
 
LVL 12

Author Closing Comment

by:tel2
ID: 36565835
Much appreciated, Terry and ozo.
I'm learning.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question