[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Perl negative lookbehind assertion problem

Posted on 2011-09-18
11
Medium Priority
?
363 Views
Last Modified: 2012-05-12
Hi Experts,

Why does the code below, print all 3 lines of input data?  I'm trying to print all lines which contain 'type="song"' not preceded by '---'.
foreach (<DATA>)
{ print if /(?<!---).+type="song"/ }

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>

Open in new window

I note that if I change the RE to be a positive lookbehind assertion:
    { print if /(?<=---).+type="song"/ }
it matches just this one line, as I would have expected.
    <slide_group name="--- This should NOT print" type="song"/>

What's going on, guys?  It this a RE conspiracy or what?
0
Comment
Question by:tel2
  • 7
  • 3
11 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557954
The pattern:
/(?<!---).+type="song"/
matches:
  <slide_group name="--- This should NOT print" type="song"/>
as soon as the regex engine looks at the T in This, because at that point the previous 3 characters are "-- ", not "---"

Make sense?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557960
Try printing if this *doesn't* match:
/---.+type="song"/
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557976
I think it would be coded like this, though there might be a more efficient way?

foreach (<DATA>)
{ print if ($_ !~ /---.+type="song"/)  }
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 12

Author Comment

by:tel2
ID: 36557996
Thanks Terry.

Re your 1st post, no, I don't get it, sorry.  I thought my negative lookbehind assertion is saying not to match if there are any '---'s before 'type="song"'.  No?  How can I say that with a negative lookbehind assertion?

Re your 2nd post, although that will work for the test data I've supplied, it doesn't meet the criteria I've specified (i.e. "print all lines which contain 'type="song"' not preceded by '---'"), so it will print a line like this:
    <slide_group name="And this also should NOT print" type="OTHER"/>
Sorry - I should have provided that in the test data.

Re your 3rd post, yes you suggestion can be coded like that, or like this:
    { print if !/---.+type="song"/ }
or this:
    { print unless /---.+type="song"/ }

I know of ways to get what I want, but right now I'm trying to understand how to get my negative lookbehind assertion to work, and whether RE's have consipired against me to slow me down.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558009
The reason is that the regex engine doesn't give up after looking at the 3 dashes and failing to match. It keeps looking for another way to match.

In fact, for the line:
  <slide_group name="--- This should NOT print" type="song"/>
The regex engine would first look at the first position in the string (before the first space character), and see it wasn't preceded by "---". It would then try to match the remaining part of expression (being /.+type="song"/) which would match, and it would instantly confirm the match. This is even earlier than what I tried to describe in my first post.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 1000 total points
ID: 36558011
Try this maybe?:
{ print if /^(?!.*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558013
Actually, that would still exclude cases where the --- is *after the text: type="song"

You can remedy that like this:
{ print if /^(?!((?!type="song").)*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558019
To explain what the pattern does, it looks ahead (but no further than type="song") and if it finds the ---, then it fails to match. The start-of-line placeholder ^ means it must find the --- if it's present before the type="song")
0
 
LVL 12

Author Comment

by:tel2
ID: 36558075
Thanks for that, Terry.

Let's see if someone comes up with a simpler RE.
0
 
LVL 85

Assisted Solution

by:ozo
ozo earned 1000 total points
ID: 36558411
foreach (<DATA>){
    print if /^(?!.*---).+type="song"/;
    print if /^((?!---).)+type="song"/;
    print if /^(?!.*---.+type="song").+type="song"/
}

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>
  <slide_group name="And should this print" type="song" --- />


I thought /(?<!---).+type="song"/ is saying not to match if there are any '---'s before 'type="song"'.
No, it is saying not to match if there is a --- before .+type="song"
.+type="song" can match
  <slide_group name="--- This should NOT print" type="song"
and there is no --- before
  <slide_group name="--- This should NOT print" type="song"
0
 
LVL 12

Author Closing Comment

by:tel2
ID: 36565835
Much appreciated, Terry and ozo.
I'm learning.
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question