Avatar of techbro
techbro
Flag for United States of America asked on

Understanding Greedy Quantifier * and +

I ran these arguments given below:

[wow]* "wow its cool"
Output: 0 "wow" 3 "" 4 "" 5 "" 6 "" 7 "" 8 "" 9 "oo" 11 "" 12 ""

[wow]+ "wow its cool"
Output: 0 "wow" 9 "oo"

I know that * is a greedy quantifier which matches as many as it can and must be zero or more, so there are empty strings after "wow". But can you tell me the reason I got "oo" in the output for both + and *?

I find greedy quantifiers hard to understand. I know + means 1 or more, and * mean zero or more, but I am not sure under what situation I am supposed to use each of them?
Can you simplify the difference between them? (Any analogy or examples would helpful)


I used the code below to try those greedy quantifiers:

import java.util.regex.*;
public class TestClass
{
	public static void main(String[] args)
	{
		Pattern p = Pattern.compile(args[0]);
		Matcher m = p.matcher(args[1]);
		boolean b = false;
		while(b = m.find())
		{
		    System.out.print(m.start()+" \""+m.group()+"\" ");
		}
	}
}

Open in new window


I will appreciate your response.
Java

Avatar of undefined
Last Comment
techbro

8/22/2022 - Mon
for_yan

Becuase there "o" inside [] -chacrcter class - and any number of them - so two "oo" will
matcgh - makes sense
for_yan

* means zero so even when nothing is in between - will matcvh, and + requoires sometihing in between
for_yan

so "C*L" pattern wll match "CL"
but "C+L" pattern should not match string "CL"
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
for_yan

soe [wow]* means that you are looking for the string which is made up of charcaters of "w" or "o" - any number of them,
therefore "oo" should match and "ww" should match or "wo" should match, it probably is not necessary to repeat "w" two times
inside the brackets
ASKER CERTIFIED SOLUTION
for_yan

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
for_yan

Ok, now I tested some of my statements above:

"C[wow]*L" should match string "CL" as even nothing between "CL" will match  - correct, see below

         Pattern p6 = Pattern.compile("C[wow]*L");
        Matcher m6 = p6.matcher("CL");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output: 0 "CL" 

Open in new window

for_yan


"C[wow]+L"  then it should not match string "CL", as it needs more than zero ofg somthing inside the []  - correct, see below


           Pattern p6 = Pattern.compile("C[wow]+L");
        Matcher m6 = p6.matcher("CL");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


No output generated as expected
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
for_yan

More tests consistent with the statements above

I guess , all seems understandable.
Please, let me know if you still have any doubts.

code:

         Pattern p6 = Pattern.compile("[wowwowow]+");
        Matcher m6 = p6.matcher("oo");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output:
output: 0 "oo" 

Open in new window


code:

           Pattern p6 = Pattern.compile("[o]+");
        Matcher m6 = p6.matcher("oo");
                 boolean b6 = false;
                 while(b6 = m6.find())
                 {
                     System.out.print("output: " + m6.start()+" \""+m6.group()+"\" ");
                 }

             System.out.println("");

Open in new window


output:
output: 0 "oo" 

Open in new window


Mick Barry

> But can you tell me the reason I got "oo" in the output for both + and *?

[wow]*

matches 0 or more 'w' or 'o''s.
The other w is actually redundant

Get rid of the square brackets if you want it to just match 'wow'
for_yan

As "[wow]*" does not need any single instance to match,
that's why it matches all these empty strings - frankly don't understand how many
empty strings it matches - I guess, one after each character with the exception of matches
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Mick Barry

> but I am not sure under what situation I am supposed to use each of them?

typically when at least a single instance is required then use +
If its optional then *
SOLUTION
Mick Barry

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Mick Barry

the example in your question also shows the difference.
When you use the + at least one character is required so the empty strings no longer match
techbro

ASKER
Thanks, it makes sense!
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.