Link to home
Create AccountLog in
Avatar of techbro
techbroFlag for United States of America

asked on

Understanding Greedy Quantifier * and +

I ran these arguments given below:

[wow]* "wow its cool"
Output: 0 "wow" 3 "" 4 "" 5 "" 6 "" 7 "" 8 "" 9 "oo" 11 "" 12 ""

[wow]+ "wow its cool"
Output: 0 "wow" 9 "oo"

I know that * is a greedy quantifier which matches as many as it can and must be zero or more, so there are empty strings after "wow". But can you tell me the reason I got "oo" in the output for both + and *?

I find greedy quantifiers hard to understand. I know + means 1 or more, and * mean zero or more, but I am not sure under what situation I am supposed to use each of them?
Can you simplify the difference between them? (Any analogy or examples would helpful)


I used the code below to try those greedy quantifiers:

import java.util.regex.*;
public class TestClass
{
	public static void main(String[] args)
	{
		Pattern p = Pattern.compile(args[0]);
		Matcher m = p.matcher(args[1]);
		boolean b = false;
		while(b = m.find())
		{
		    System.out.print(m.start()+" \""+m.group()+"\" ");
		}
	}
}

Open in new window


I will appreciate your response.
Avatar of for_yan
for_yan
Flag of United States of America image

Becuase there "o" inside [] -chacrcter class - and any number of them - so two "oo" will
matcgh - makes sense
* means zero so even when nothing is in between - will matcvh, and + requoires sometihing in between
so "C*L" pattern wll match "CL"
but "C+L" pattern should not match string "CL"
soe [wow]* means that you are looking for the string which is made up of charcaters of "w" or "o" - any number of them,
therefore "oo" should match and "ww" should match or "wo" should match, it probably is not necessary to repeat "w" two times
inside the brackets
ASKER CERTIFIED SOLUTION
Avatar of for_yan
for_yan
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Ok, now I tested some of my statements above:

"C[wow]*L" should match string "CL" as even nothing between "CL" will match  - correct, see below

         Pattern p6 = Pattern.compile("C[wow]*L");
        Matcher m6 = p6.matcher("CL");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output: 0 "CL" 

Open in new window


"C[wow]+L"  then it should not match string "CL", as it needs more than zero ofg somthing inside the []  - correct, see below


           Pattern p6 = Pattern.compile("C[wow]+L");
        Matcher m6 = p6.matcher("CL");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


No output generated as expected
More tests consistent with the statements above

I guess , all seems understandable.
Please, let me know if you still have any doubts.

code:

         Pattern p6 = Pattern.compile("[wowwowow]+");
        Matcher m6 = p6.matcher("oo");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output:
output: 0 "oo" 

Open in new window


code:

           Pattern p6 = Pattern.compile("[o]+");
        Matcher m6 = p6.matcher("oo");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output:
output: 0 "oo" 

Open in new window


Avatar of Mick Barry
> But can you tell me the reason I got "oo" in the output for both + and *?

[wow]*

matches 0 or more 'w' or 'o''s.
The other w is actually redundant

Get rid of the square brackets if you want it to just match 'wow'
As "[wow]*" does not need any single instance to match,
that's why it matches all these empty strings - frankly don't understand how many
empty strings it matches - I guess, one after each character with the exception of matches
> but I am not sure under what situation I am supposed to use each of them?

typically when at least a single instance is required then use +
If its optional then *
SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
the example in your question also shows the difference.
When you use the + at least one character is required so the empty strings no longer match
Avatar of techbro

ASKER

Thanks, it makes sense!