regex: $x =~ /\G/gc and pos ($x)

==== example: =====
use strict;
use Data::Dumper;

my $x = 'asdf';

print Dumper(pos $x);
print "OK\n" if $x =~ /\G/gc;
print Dumper(pos $x);
print "OK\n" if $x =~ /\G/gc;
print Dumper(pos $x);
pos $x = pos $x;
print "OK\n" if $x =~ /\G/gc;


==== actual output:=====
$VAR1 = undef;
OK
$VAR1 = '0';
$VAR1 = '0';
OK

===== output I expect: =====
$VAR1 = undef;
OK
$VAR1 = '0';
OK                   <---- NOTE
$VAR1 = '0';
OK
===== end =====

What is going on?  And why does "pos $x = pos $x;" seem to have a side effect?
LVL 2
ext2Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

holliCommented:
because pos $x = pos $x is an assignment not an comparison. what you need is pos $x == pos $x
0
roee_fCommented:
What are you trying to match?
\G is used AFTER you've matched something (in a loop, usually) and you want to get what comes after it. In your snippet none of your regexps match anything in $x.
0
ext2Author Commented:
The code is correct.  It is not intended to be useful, but it is a test case for regex behavior.

After having reread the sections on pos, \G, and /../gc in Mastering Regular Expressions, 2nd ed, Jeffrey Friedl (pp.~313, 129), what I think is occuring is a somewhat obscure "forced bump-ahead" behavior of /../g that is invoked by Perl to prevent an infinite loop (p. 129).  For example,

  my $x = 'abcde';
   $x =~ s/x?/!/g;
   print $x;

does in fact complete.  It prints "!a!b!c!d!e!".

On the other hand,

   my $x = 'abcde';
   $x =~ s/\Gx?/!/g;
   print $x;

prints "!abcde".  This is because \G matches only at the end of the previous match rather than the beginning of the next match.  When the forced bump-ahead occurs (as it does here), these two locations are not equivalent, so the match on \G fails for all but the first iteration.

The question I then have is what exactly does pos($x) mean under such circumstances?  Consider:

  my $x = 'abcde';
  pos($x) = 0; # just to be sure (not needed)
  $x =~ s/(?{ print "A" . pos($x) })\G(?{ print "B" . pos($x) })x?/!/g;
  print $x;

prints "A0B0A0B0A1A2A3A4A5!abcde".

It's interesting that the first and second iterations of the loop give exactly the same results "A0B0".  Therefore, the behavior of the regex must be depending on *some internal state* other than pos($x).

If you don't think it's valid to put something before the "\G", then try this:

  my $x = 'abcde';
  $x =~ s/\G(?{ print "B" . pos($x) })x?/!/g;
  print $x;

which prints "B0B0!abcde".

How about this:

  my $x = 'abcde';
  while($x =~ /\G(?{ print "B" . pos($x) })x?/g) {
     # pos($x) = pos($x);
     print '*';
  }
  print $x;

This prints "B0*B0abcde".  However, disable that comment character, and the loop is infinite.  Therefore, pos($x) = pos($x) actually does have some effect, and it is resetting some internal variable.
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

ext2Author Commented:
If anyone really wants to know why this issue came up, well, I'm writing a simpler lexer, and it happened that a test cases failed if I added the code

  $x =~ /\G/gc;

to the very beginning of the program.  I did this because I was concerned that pos($x) was undefined rather than zero.  This code caused the expected side effect of setting pos($x) equal to zero.  However, it also suddenly caused the test cases to fall into an infinite loop.  As a simplified example,

  my $x = '';

  # $x =~ /\G/gc;
  while(not $x =~ /\G\z/gc) {
    print "pos=", pos($x), "\n";
    if($x =~ /\G([a-z])/gc) { print "$1"; }
    else {
      $x =~ /\G[^a-z]+/gc;
    }
  }

becomes an infinite loop printing "pos=0" if the comment character is removed.

The obvious solution is to "not do that" and instead use the much more obvious "pos(x) = 0", which does not have the unintended side-effect.  However, I'm interested in what's really going on here.
0
ext2Author Commented:
correction: pos($x) not pos(x)
0
roee_fCommented:
The truth is that in every change of the string  (creation, assignment to it, and so on), pos resets to undef which is the begining of the string. So, even the assignment of pos to 0 at the beginning is reduindant.

About your question.
pos itself remisn unchanged, only \G is getting bumped-up one char. You have to remember that pos is getting assigned only in case of a succesful match, so there is no reason to update it in case of failure.
as \G is part of the regexp engine, you have to bump it in order to ignore infinite loops, but pos is not of the engine, and so updateing it is useless.
0
ext2Author Commented:
roee_f,

I reread the sections in Mastering Regular Expressions again ;)

One thing is that the text doesn't seem to ever suggest that \G bumps along.  Rather, it mentions that \G is the location of the end of the previous match (regardless of bumping), and pos is the thing that can bump along.  So, I wrote this test:

  my $x = "abcde";
  $x =~ /a/gc;
  print pos($x);
  print "A" if $x =~ /\G(?{print 'X'})/gc; print pos($x);
  print "B" if $x =~ /\G(?{print 'Y'})/gc; print pos($x);
  print "C" if $x =~ /\G(?{print 'Z'})/gc; print pos($x);
  $x =~ /\G(.)/gc and print $1;

This prints "1XA1Y1Z1b".  This implies that in lines 4, 5, and 6, both \G and pos remain at string index 1 and do not bump along.  So, something else must be causing the behavior of line 5 and 6.

Reaching into the forgotten regex debugger:

  perl -Mre=debug testre.pl

Produces

===
...
Guessing start of match, REx `a' against `abcde'...
Found anchored substr `a' at offset 0...
Guessed: match at offset 0
Matching REx `\G(?{print 'X'})' against `bcde'
  Setting an EVAL scope, savestack=17
   1 <a> <bcde>           |  1:  GPOS
   1 <a> <bcde>           |  2:  EVAL
  re_eval 0x10146480
   1 <a> <bcde>           |  4:  END
Match successful!
Matching REx `\G(?{print 'Y'})' against `bcde'
  Setting an EVAL scope, savestack=17
   1 <a> <bcde>           |  1:  GPOS
   1 <a> <bcde>           |  2:  EVAL
  re_eval 0x10146540
   1 <a> <bcde>           |  4:  END
Match possible, but length=0 is smaller than requested=1, failing!
  Clearing an EVAL scope, savestack=17..20
Match failed
Matching REx `\G(?{print 'Z'})' against `bcde'
  Setting an EVAL scope, savestack=17
   1 <a> <bcde>           |  1:  GPOS
   1 <a> <bcde>           |  2:  EVAL
  re_eval 0x10146600
   1 <a> <bcde>           |  4:  END
Match possible, but length=0 is smaller than requested=1, failing!
  Clearing an EVAL scope, savestack=17..20
Match failed
Matching REx `\G(.)' against `bcde'
  Setting an EVAL scope, savestack=5
   1 <a> <bcde>           |  1:  GPOS
   1 <a> <bcde>           |  2:  OPEN1
   1 <a> <bcde>           |  4:  REG_ANY
   2 <ab> <cde>           |  5:  CLOSE1
   2 <ab> <cde>           |  7:  END
Match successful!
...
===

So, what is happening is that once a regex matches a string of length zero, the next match is required to be of length at least one.  *Right after the next match completes*, the length of the match is checked.  No bump along occurs here.  If the length is zero again, the return value of this match is overriden to be false.  Secondly, doing "pos($x) = pos($x)" seems to be an obscure way to reset this condition.

0
ext2Author Commented:
probably a refund--answered my own question.
0
moduloCommented:
PAQed with points refunded (70)

modulo
Community Support Moderator
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.