• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1177
  • Last Modified:

How do I pass a complex regexp to Grep for Windows?

Hi -

Regarding the version of grep distributed here, which I am running from the Windows XP command line:
http://pages.interlog.com/~tcharron/grep.html

If I run this command:
grep -S -"size" *.flam3

I get results okay (it prints every file name and line where there is
a match - which in my case is every *.flam3 file in the directory).

But if I run this command -
grep -S "(?<=size=")[0-9]+" *.flam3

I get this error:
E:\Program Files\Grep\grep.exe: : Not enough memory

(My grep install is on an E: drive and included in the windows global
PATH environment variable.)

According to this "PowerGREP" tool and "regexBuddy" software I'm
using, that regular expression -
(?<=size=")[0-9]+

- should match the first number after size=" (so it will match 800) in
the below:
time="0" size="800 592" center="-0.408976 -0.305538"

- and alternately, this expression
(?<=size="[0-9]+ )[0-9]+

- should match the second number (or 592) in the same area:
time="0" size="800 592" center="-0.408976 -0.305538"

What am I doing wrong, or what may be wrong?

Puzzling: I can get a match from this command (it outputs a 1, meaning "match found" -

grep -c "size=""" electricsheep.243.06908.flam3

or from this:

grep -c [0-9] electricsheep.243.06908.flam3

But again, not from that first problem regex mentioned.  I'm suspecting the problem has to do with either how to tell it to look for a quote mark, or how to set off the regex from the quote marks required around the expression.

Also, when I get it narrowed down to finding a match, I'll wonder how to display only the match text (not the whole filename and line that matches), as I'll be passing the whole thing into a FINDSTR function in a Windows batch file, like so:

FOR /F "usebackq tokens=* delims=*" %%X IN (`grep "matchtext" file.flam3`) DO      (
      SET var=%%X
      ECHO var is !var!
                                                                                                            )

Thank you!
0
openhatch
Asked:
openhatch
  • 5
  • 4
3 Solutions
 
HonorGodCommented:
Is the double quote part of your required string?

If so, you have to tell windows that it is special.  Do so using the windows "escape" character, which is a carot character '^' (i.e., shift 6 on most US keyboards).
0
 
omarfaridCommented:
try to put the string between single quot '
0
 
openhatchAuthor Commented:
Thanks for the suggestions, and yes, the double-quote within the expression is a literal character I want it to match.

I tried these suggestions apart and together (as follows).  I simplified the command to use just the -c switch to output 1 if there is any match (-S told it to search over all possible files), and also to search one known filename:
grep "(?<=size=")[0-9]+" -c electricsheep.243.06908.flam3

Tried it with a carot -
grep "(?<=size=^")[0-9]+" -c electricsheep.243.06908.flam3

- and got this error: Not enough memory.

Enclosing the whole regexp in single quotes ' ' instead of double " -
grep '(?<=size=")[0-9]+' -c electricsheep.243.06908.flam3

- gives this error: The system cannot find the file specified.  Same with single quotes surrounding it and the carot before the double-quote within the expression:
grep '(?<=size=^")[0-9]+' -c electricsheep.243.06908.flam3

Baffled yet :|
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
openhatchAuthor Commented:
Also - it seems I may not need the quotes around the expr. at all (although grep help itself gives examples using them), as this:

grep -c size=.[1-9] electricsheep.243.06908.flam3

- (using the "any character" match or dot . in place of the double-quote ") returns 1.  But if I put in a literal double-quote " instead of that dot . -

grep -c size="[1-9] electricsheep.243.06908.flam3

it throws that "Not enough memory" error.

??
0
 
HonorGodCommented:
As long as the expression doesn't contain blanks, you don't need it to be contained within double quotes.  Windows treats single and double quotes  differently, so you need to use double quotes.

If you simplify the expression, does the version of grep being used complete successfully?  For example, if you use:

grep -c size= electricsheep.243.06908.flam3

Does it give the expected output?

From where did you obtain that grep?
What is the version information it displays when you try something like?

grep --help
0
 
openhatchAuthor Commented:
Yes, it completes successfully if I simplify the expression with that suggested line, and a few other simple literal and regexp match patterns I've tried.  What it returns is:

1

- which with the -c switch means "match found".  If I run the command without the -c switch, it prints (onscreen) everything from the line of text where the match was found.

The link at the top of my question pulls up the page where I found this distribution of grep - and to be more exact, the link from *that* page is:

http://www.interlog.com/~tcharron/grep20d_win.zip

When I type grep --help it gives a lot of usage instructions, and also the grep version, which is "GNU grep version 2.0d".  Whoops.  I'd thought this was a Unix tool, but "GNU's Not Unix" :) so I guess I posted this question under a wrong heading.  Will correct that if I can.

And duly noted that I need to use double quotes.  When I do that here though:

grep (?<=size="")[0-9]+ electricsheep.243.06908.flam3

It still gives that "Not enough Memory" error.

Using the double-quote as an escape character for a literal match, I just found I get a positive return, so that -

grep -c size="" electricsheep.243.06908.flam3

returns: 1.

So that may be leading in the right direction.

However, reading up on this regexp in the program that I built it in (RegexBuddy), I'm trying a positive lookbehind literal match on:

size="

to match any word that is a group of digits after it.

The following may be an unrelated tangent, so forgive me if it is - I'm noticing that the most elementary positive lookbehind syntax is not working in any online regexp tester I use.  For example, this:

(?<=size)800

checked against this text:

size800

In either of these tools:
http://www.regextester.com/
http://www.regular-expressions.info/javascriptexample.html

- DOES NOT WORK.  I've read it's hard to implement as it can be inefficient, and simply not implemented in some regexp tools.  And apparently it *should* be implemented in this grep port to Windows, though I just found a page that makes me wonder, as this page reporting a bug shows a regexp with positive lookbehind that is not working on Windows, in a _newer_ port of grep:

http://www.mail-archive.com/bug-grep@gnu.org/msg00622.html

So I'd like to at least try a very simple positive lookbehind test in some other tool that someone knows it works in on windows.  In other words I'm thinking of switching tools.  I'd just prefer calling a standalone exe tool instead of something that would require users to install, say, Cygwin.
0
 
openhatchAuthor Commented:
Found it!

All these suggestions helped me look in the right direction.

Using this version of grep:
http://sourceforge.net/project/downloading.php?group_id=23617&filename=grep-2.5.4-setup.exe&a=2602682

And this command:

grep -P -o "(?<=size=""")[0-9]+" electricsheep.243.06908.flam3

Against this (portion of the full) text:
time="0" size="800 592" center="-0.408976 -0.305538"

- returns:
800

I haven't narrowed down the regexp to get the second number (592), but I have found one that returns both:

grep -P -o "(?<=size=""")[0-9]* [0-9]+" electricsheep.243.06908.flam3

- which I can work with.

For this solution, the regexp has to be Perl compatible (-P) switch, used the -o switch (output only matching text, which was not available in the older windows port of grep I'd been using), had to be enclosed in double-quotes " and also had to use triple double-quotes """ to escape the required double-quote match in the regexp.

That was easy as pie!  (NOT.)

Thank you all!

Not really into awarding points to myself, and these suggestions together led to the solution, but apparently I can only award points to one person? - so I'll award the point to whoever finds me the regexp that returns the second number, 592, if that's even possible :) because when I put [0-9] in a positive lookbehind, it throws an error that I'm searching for something of an undetermined length.
0
 
openhatchAuthor Commented:
Sorry, that second regexp that returns both numbers was supposed to be this, replacing the asterisk with a plus:

grep -P -o "(?<=size=""")[0-9]+ [0-9]+" electricsheep.243.06908.flam3
0
 
HonorGodCommented:
No, you can split points as you see fit.  Pick 1 as "the solution", and others as "assist", and give the points out accordingly.

Sorry I missed the source of grep that you were using.

I'm happy to hear that you found the solution.

Good luck & have a great day.
0
 
HonorGodCommented:
Thank you for the grade & points.

Good luck & have a great day
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now