Solved

Perl negative lookbehind assertion problem

Posted on 2011-09-18
11
348 Views
Last Modified: 2012-05-12
Hi Experts,

Why does the code below, print all 3 lines of input data?  I'm trying to print all lines which contain 'type="song"' not preceded by '---'.
foreach (<DATA>)
{ print if /(?<!---).+type="song"/ }

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>

Open in new window

I note that if I change the RE to be a positive lookbehind assertion:
    { print if /(?<=---).+type="song"/ }
it matches just this one line, as I would have expected.
    <slide_group name="--- This should NOT print" type="song"/>

What's going on, guys?  It this a RE conspiracy or what?
0
Comment
Question by:tel2
  • 7
  • 3
11 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557954
The pattern:
/(?<!---).+type="song"/
matches:
  <slide_group name="--- This should NOT print" type="song"/>
as soon as the regex engine looks at the T in This, because at that point the previous 3 characters are "-- ", not "---"

Make sense?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557960
Try printing if this *doesn't* match:
/---.+type="song"/
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36557976
I think it would be coded like this, though there might be a more efficient way?

foreach (<DATA>)
{ print if ($_ !~ /---.+type="song"/)  }
0
Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 12

Author Comment

by:tel2
ID: 36557996
Thanks Terry.

Re your 1st post, no, I don't get it, sorry.  I thought my negative lookbehind assertion is saying not to match if there are any '---'s before 'type="song"'.  No?  How can I say that with a negative lookbehind assertion?

Re your 2nd post, although that will work for the test data I've supplied, it doesn't meet the criteria I've specified (i.e. "print all lines which contain 'type="song"' not preceded by '---'"), so it will print a line like this:
    <slide_group name="And this also should NOT print" type="OTHER"/>
Sorry - I should have provided that in the test data.

Re your 3rd post, yes you suggestion can be coded like that, or like this:
    { print if !/---.+type="song"/ }
or this:
    { print unless /---.+type="song"/ }

I know of ways to get what I want, but right now I'm trying to understand how to get my negative lookbehind assertion to work, and whether RE's have consipired against me to slow me down.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558009
The reason is that the regex engine doesn't give up after looking at the 3 dashes and failing to match. It keeps looking for another way to match.

In fact, for the line:
  <slide_group name="--- This should NOT print" type="song"/>
The regex engine would first look at the first position in the string (before the first space character), and see it wasn't preceded by "---". It would then try to match the remaining part of expression (being /.+type="song"/) which would match, and it would instantly confirm the match. This is even earlier than what I tried to describe in my first post.
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 250 total points
ID: 36558011
Try this maybe?:
{ print if /^(?!.*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558013
Actually, that would still exclude cases where the --- is *after the text: type="song"

You can remedy that like this:
{ print if /^(?!((?!type="song").)*---).+type="song"/ }
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36558019
To explain what the pattern does, it looks ahead (but no further than type="song") and if it finds the ---, then it fails to match. The start-of-line placeholder ^ means it must find the --- if it's present before the type="song")
0
 
LVL 12

Author Comment

by:tel2
ID: 36558075
Thanks for that, Terry.

Let's see if someone comes up with a simpler RE.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 250 total points
ID: 36558411
foreach (<DATA>){
    print if /^(?!.*---).+type="song"/;
    print if /^((?!---).)+type="song"/;
    print if /^(?!.*---.+type="song").+type="song"/
}

__DATA__
  <slide_group name="This line should print" type="song"/>
  <slide_group name="--- This should NOT print" type="song"/>
  <slide_group name="And this should print" type="song"/>
  <slide_group name="And should this print" type="song" --- />


I thought /(?<!---).+type="song"/ is saying not to match if there are any '---'s before 'type="song"'.
No, it is saying not to match if there is a --- before .+type="song"
.+type="song" can match
  <slide_group name="--- This should NOT print" type="song"
and there is no --- before
  <slide_group name="--- This should NOT print" type="song"
0
 
LVL 12

Author Closing Comment

by:tel2
ID: 36565835
Much appreciated, Terry and ozo.
I'm learning.
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to get all the API from website? 11 104
Parse CSS value with RegEx 2 78
REReplaceNoCase help 1 43
Able to retrieve only 1 row through email amongst multiple rows 3 54
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question