Posted on 2013-11-19
Last Modified: 2013-11-20
I have this regex in my Java code. What does it mean? Examples?


Question by:jazzIIIlove
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/"\[(.*?)\]/)->explain;'
The regular expression:


matches as follows:
NODE                     EXPLANATION
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
  "                        '"'
  \[                       '['
  (                        group and capture to \1:
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
  )                        end of \1
  \]                       ']'
)                        end of grouping
Basically, anything that starts with [(, ends with )], with any number of (or no) characters other than a newline character in between.
While ozo's post is correct, maybe you are looking for more of an explanation than that. And awking00 is close, but with a slight error and missing one important part, so this is my go at it...

The \[ and \] match literal square brackets [ and ]. The reason for the \ in front of them is to escape them as otherwise they have special meaning.

The ( and ) tell the regex engine to capture whatever matches inside these brackets. You can use whatever text that these capture in your replacement string (if you are using this in a "replace" method call) or you can actually retrieve this text afterwards, if desired.

The .*? in the middle tell the regex engine to match any number of any characters (including no characters) BUT to only match the minimum possible. How do we know this, well the . tells it to match any character, the * tell it to match 0 or more times and the ? makes this expression non-greedy, ie. only take what you need to.

I'm guessing that this last part is possibly what your real question is about, so some examples may help.

Using the above expression, ie.     \[(.*?)\]

on         Hello [World]                       gives you the captured output of         World
on         Hello []                                   gives you an empty string as the captured output
on         [Hello] [World]                    gives you the captured output of          Hello
          (note that you could run the match again to get the "next" match of World)

If you used the (default) greedy expression, ie.       \[(.*)\]                          Note there is no ?

on         Hello [World]                       gives you the captured output of         World
on         Hello []                                   gives you an empty string as the captured output
on         [Hello] [World]                    gives you the captured output of          Hello] [World

If you look at the above, the first two examples for each expression give you the same output, but the third one differs. The non-greedy expression only gave you Hello because that was the MINIMUM needed to make the match succeed. The greedy expression, however, took as much as it could whilst still allowing the match to succeed.

Hope that helps...
ah my mistake, ozo's comment looks really messed in my iphone and got confused.
Thanks and this is an equal split.

