Understanding Greedy Quantifier * and +

techbro
techbro used Ask the Experts™
on
I ran these arguments given below:

[wow]* "wow its cool"
Output: 0 "wow" 3 "" 4 "" 5 "" 6 "" 7 "" 8 "" 9 "oo" 11 "" 12 ""

[wow]+ "wow its cool"
Output: 0 "wow" 9 "oo"

I know that * is a greedy quantifier which matches as many as it can and must be zero or more, so there are empty strings after "wow". But can you tell me the reason I got "oo" in the output for both + and *?

I find greedy quantifiers hard to understand. I know + means 1 or more, and * mean zero or more, but I am not sure under what situation I am supposed to use each of them?
Can you simplify the difference between them? (Any analogy or examples would helpful)


I used the code below to try those greedy quantifiers:

import java.util.regex.*;
public class TestClass
{
	public static void main(String[] args)
	{
		Pattern p = Pattern.compile(args[0]);
		Matcher m = p.matcher(args[1]);
		boolean b = false;
		while(b = m.find())
		{
		    System.out.print(m.start()+" \""+m.group()+"\" ");
		}
	}
}

Open in new window


I will appreciate your response.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Awarded 2011
Awarded 2011

Commented:
Becuase there "o" inside [] -chacrcter class - and any number of them - so two "oo" will
matcgh - makes sense
Awarded 2011
Awarded 2011

Commented:
* means zero so even when nothing is in between - will matcvh, and + requoires sometihing in between
Awarded 2011
Awarded 2011

Commented:
so "C*L" pattern wll match "CL"
but "C+L" pattern should not match string "CL"
Why Diversity in Tech Matters

Kesha Williams, certified professional and software developer, explores the imbalance of diversity in the world of technology -- especially when it comes to hiring women. She showcases ways she's making a difference through the Colors of STEM program.

Awarded 2011
Awarded 2011

Commented:
soe [wow]* means that you are looking for the string which is made up of charcaters of "w" or "o" - any number of them,
therefore "oo" should match and "ww" should match or "wo" should match, it probably is not necessary to repeat "w" two times
inside the brackets
Awarded 2011
Awarded 2011
Commented:
so if you will be looking for

"C[wow]+L"  then it should not match string "CL", as it needs more than zero ofg somthing inside the []
"C[wow]*L" should match string "CL" as even nothing between "CL" will match
Awarded 2011
Awarded 2011

Commented:
Ok, now I tested some of my statements above:

"C[wow]*L" should match string "CL" as even nothing between "CL" will match  - correct, see below

         Pattern p6 = Pattern.compile("C[wow]*L");
        Matcher m6 = p6.matcher("CL");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output: 0 "CL" 

Open in new window

Awarded 2011
Awarded 2011

Commented:

"C[wow]+L"  then it should not match string "CL", as it needs more than zero ofg somthing inside the []  - correct, see below


           Pattern p6 = Pattern.compile("C[wow]+L");
        Matcher m6 = p6.matcher("CL");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


No output generated as expected
Awarded 2011
Awarded 2011

Commented:
More tests consistent with the statements above

I guess , all seems understandable.
Please, let me know if you still have any doubts.

code:

         Pattern p6 = Pattern.compile("[wowwowow]+");
        Matcher m6 = p6.matcher("oo");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output:
output: 0 "oo" 

Open in new window


code:

           Pattern p6 = Pattern.compile("[o]+");
        Matcher m6 = p6.matcher("oo");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output:
output: 0 "oo" 

Open in new window


Mick BarryJava Developer
Top Expert 2010

Commented:
> But can you tell me the reason I got "oo" in the output for both + and *?

[wow]*

matches 0 or more 'w' or 'o''s.
The other w is actually redundant

Get rid of the square brackets if you want it to just match 'wow'
Awarded 2011
Awarded 2011

Commented:
As "[wow]*" does not need any single instance to match,
that's why it matches all these empty strings - frankly don't understand how many
empty strings it matches - I guess, one after each character with the exception of matches
Mick BarryJava Developer
Top Expert 2010

Commented:
> but I am not sure under what situation I am supposed to use each of them?

typically when at least a single instance is required then use +
If its optional then *
Mick BarryJava Developer
Top Expert 2010
Commented:
> Can you simplify the difference between them? (Any analogy or examples would helpful)

If you define a number as any number of digits, but first cannot be zero then the pattern would be

[1-9]+[0-9]*

Which says it starts with 1-9, followed optionally by any number of digits 0-9
Mick BarryJava Developer
Top Expert 2010

Commented:
the example in your question also shows the difference.
When you use the + at least one character is required so the empty strings no longer match

Author

Commented:
Thanks, it makes sense!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial