Cannot find out how to write a regular expression

Hello!

I am working in a Java EE project with Java 7 and searching for a solution how to write a regular expression. On the tutorial sites like RegularExpressions.info, I couldn't find the answer, and I was unfortunately inapt to construct it on a test site. So, I ask you.

What I need is a regular expression that runs under Java 7 and filters the input such that it returns any set of characters & words that do not contain the words CONTROL and TRACTION in a case-insensitive way. How to write it?

Thank you for your help!

Notice: Although this is a regular expression question, I added the topic "Java", as it seems that there are some diversions between what I used under UNIX decades ago & what Java supports. For instance, I had to correct my expressions from the commented code to the uncommented one:

			String regularExpression = "(ERROR|";
[…]
					//regularExpression += "(CONTROL|TRACTION).(CAR|UNIT))";
					regularExpression += "(CONTROL|MOTOR).(CAR|UNIT)|(Control|Motor).(car|unit))"

Open in new window

LVL 1
Ahmet Ekrem SABANSenior IT consultantAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Giovanni HewardCommented:
How about :

try {
	if (subjectString.matches("(?i)\\b(?:(?!control|TRACTION%)\\w)+\\b")) {
		// String matched entirely
	} else {
		// Match attempt failed
	} 
} catch (PatternSyntaxException ex) {
	// Syntax error in the regular expression
}

Open in new window

or
try {
	Pattern regex = Pattern.compile("\\b(?:(?!control|TRACTION%)\\w)+\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
	Matcher regexMatcher = regex.matcher(subjectString);
	while (regexMatcher.find()) {
		// matched text: regexMatcher.group()
		// match start: regexMatcher.start()
		// match end: regexMatcher.end()
	} 
} catch (PatternSyntaxException ex) {
	// Syntax error in the regular expression
}

Open in new window


RegEx Buddy is a great tool to learn with.
0
Kent DyerIT Security Analyst SeniorCommented:
RegExBuddy shows:

\b(?:(?!control!traction)\w)+\b

Open in new window


Now, I think this maybe a bit too-complex..  We should be able to shorten this down to:

!control!traction

Open in new window


HTH,

Kent
0
CEHJCommented:
I personally wouldn't use regex as your loss in efficiency and maintability outweighs your gain in code conciseness. It's pretty simple to just look for substrings,  but if you're determined to use regex, in Java, one way would be:

	final String RE = "(?i).*?CONTROL.*?TRACTION.*|.*?TRACTION.*?CONTROL.*";
	boolean controlAndTractionAbsent = !args[0].matches(RE);

Open in new window

0
Introduction to Web Design

Develop a strong foundation and understanding of web design by learning HTML, CSS, and additional tools to help you develop your own website.

Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
I wrote the following code that returns the statements below:

public class ExcludeWords {

    /**
     * Test method.
     *
     * @param arguments
     */
    public static void main(final String[] arguments) {
        final BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
        String text = null;

        try {
            final String subject = "(?i)\\b(?:(?!control|TRACTION%)\\w)+\\b";

            final Pattern regularExpression = Pattern.compile(subject,
                    Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);

            while (true) {
                System.out.print("Enter a test text: ");
                text = input.readLine();

                if (text.length() == 0) {
                    break;
                }
                System.out.println('"' + text + '"'
                        + (text.matches(subject) ? " matches " : " does not match ")
                        + '"' + regularExpression + '"');
            };
        } catch (final IOException ioe) {
            ioe.printStackTrace();
        }
    }

}

Open in new window


I/O:
Enter a test text: Control
"Control" does not match "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: CONTROL
"CONTROL" does not match "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: TRACTION
"TRACTION" matches "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: Traction
"Traction" matches "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: TrACTION
"TrACTION" matches "(?i)\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: Ekrem
"Ekrem" matches "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: Q
"Q" matches "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: Quo vadis?
"Quo vadis?" does not match "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: Quo vadis
"Quo vadis" does not match "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text: Freight vehicle
"Freight vehicle" does not match "\b(?:(?!control|TRACTION%)\w)+\b"
Enter a test text:

All texts that do not contain "CONTROL" or "TRACTION" should match. But the underlined texts are found to be not matching.
0
CEHJCommented:
What is that percent sign in your regex?
0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
You may test your proposals with this code or correct the code, if it is wrong. The first five texts should not match, the rest should

	public static void main(final String[] arguments) {
		final Vector<String> texts = new Vector<String>();
		
		texts.add("Control");
		texts.add("CONTROL");
		texts.add("TRACTION");
		texts.add("Traction");
		texts.add("TrACTION");
		texts.add("Ekrem");
		texts.add("Q");
		texts.add("Quo vadis?");
		texts.add("Quo vadis");
		texts.add("Freight vehicle");
		texts.add("Locomotive");
		texts.add("out of control");
		texts.add("uncontrolled man");
		texts.add("atraction");

		//final String subject = "(?i)\\b(?:(?!control|TRACTION%)\\w)+\\b";
		//final String subject = "!control!traction";
		final String subject = "(?i).*?CONTROL.*?TRACTION.*|.*?TRACTION.*?CONTROL.*";

		final Pattern regularExpression = Pattern.compile(subject,
				Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);

		for (final String text : texts) {
			System.out.println('"' + text + '"'
					+ (text.matches(subject) ? " matches " : " does not match ")
					+ '"' + regularExpression + '"');
		}
	}

}

Open in new window

0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
The percent sign is in the regex of  x66_x72_x65_x65.
0
CEHJCommented:
Also, i don't think you've specified very precisely - does order matter? Where are the word boundaries?
0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
Correction:

        //final String subject = "(?i)\\b(?:(?!control|TRACTION%)\\w)+\\b";
        //final String subject = "(?i)\\b(?:(?!control|TRACTION)\\w)+\\b";
        //final String subject = "!control!traction";
        final String subject = "(?i).*?CONTROL.*?TRACTION.*|.*?TRACTION.*?CONTROL.*";

        final Pattern regularExpression = Pattern.compile(subject,
                Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);

        for (final String text : texts) {
            System.out.println('"' + text + '"'
                    + (Pattern.matches(subject, text) ? " matches " : " does not match ")
                    + '"' + regularExpression + '"');
        }

Open in new window

0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
Let me specify it again: ANY text that does not contain "CONTROL" or "TRACTION" should pass regardless of case, order, or other restrictions. For instance, "attraction" should pass the check. Actually, I do not use the regularExpression variable above. So the code I use is

        for (final String text : texts) {
            System.out.println('"' + text + '"'
                    + (text.matches(subject) ? " matches" : " does not match") + '.');
        }

Open in new window

0
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Let me specify it again: ANY text that does not contain "CONTROL" or "TRACTION" should pass regardless of case, order, or other restrictions. For instance, "attraction" should pass the check.
I still don't think it is clear. So from that you are saying that even though "attraction" has contains the set of characters "traction" it should still pass because the *word* is not traction. Therefore, while "attraction" passes, "at  traction" should NOT pass and also "traction" should NOT pass.

If that is correct, can you please verify that even though you have put the test "out of control" in the lower group with the other examples that SHOULD pass, this specific example "out of control" should NOT pass?


Once we clarify all that, one thing that will make this easier... rather than finding the pattern matches or text NOT containing either of your words, can we use a pattern that matches either word but then invert the result in Java (rather than in the pattern), ie. is the following suitable, noting the ! in front of the execution of the regex (I've also changed the code, a) to actually use 'regularExpression' and b) to use find() rather than adding extra .* to the pattern)
package regex;

import java.util.Vector;
import java.util.regex.Pattern;

public class Regex {
    public static void main(final String[] arguments) {
        final Vector<String> texts = new Vector<String>();
        
        texts.add("Control");
        texts.add("CONTROL");
        texts.add("TRACTION");
        texts.add("Traction");
        texts.add("TrACTION");
        texts.add("Ekrem");
        texts.add("Q");
        texts.add("Quo vadis?");
        texts.add("Quo vadis");
        texts.add("Freight vehicle");
        texts.add("Locomotive");
        texts.add("out of control");
        texts.add("uncontrolled man");
        texts.add("atraction");

        final String subject = "\\b(CONTROL|TRACTION)\\b";

        final Pattern regularExpression = Pattern.compile(subject,
                Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);

        for (final String text : texts) {
            System.out.println('"' + text + '"'
                    + (!regularExpression.matcher(text).find() ? " matches " : " does not match ")
                    + '"' + regularExpression + '"');
        }
    }
}

Open in new window

0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
Oh, "out of control" should not pass! Sorry for that. "Control of" should also not pass, the order is irrelevant and the case also.
0
CEHJCommented:
I think i might be beginning to get more confused as time goes on ;)
Please let me know if this output is correct:
Target(s) absent in the string 'Control'? false
Target(s) absent in the string 'CONTROL'? false
Target(s) absent in the string 'TRACTION'? false
Target(s) absent in the string 'Traction'? false
Target(s) absent in the string 'TrACTION'? false
Target(s) absent in the string 'Ekrem'? true
Target(s) absent in the string 'Q'? true
Target(s) absent in the string 'Quo vadis?'? true
Target(s) absent in the string 'Quo vadis'? true
Target(s) absent in the string 'Freight vehicle'? true
Target(s) absent in the string 'Locomotive'? true
Target(s) absent in the string 'out of control'? false
Target(s) absent in the string 'uncontrolled man'? true
Target(s) absent in the string 'atraction'? true

Open in new window

0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
What CEHJ writes above is what I want. Here again the correct check:
public class ExcludeWords {

	/**
	 * Test method.
	 *
	 * @param arguments
	 */
	public static void main(final String[] arguments) {
		final Vector<String> texts = new Vector<String>();

		texts.add("Control");
		texts.add("CONTROL");
		texts.add("TRACTION");
		texts.add("Traction");
		texts.add("TrACTION");
		texts.add("out of control");
		
		texts.add("Ekrem");
		texts.add("Q");
		texts.add("Quo vadis?");
		texts.add("Quo vadis");
		texts.add("Freight vehicle");
		texts.add("Locomotive");
		texts.add("uncontrolled man");
		texts.add("attraction");

		byte counter = 1;
		final String subject = "(?i)\\b(?:(?!control|TRACTION%)\\w)+\\b";
		//final String subject = "(?i)\\b(?:(?!control|TRACTION)\\w)+\\b";
		//final String subject = "!control!traction";
		//final String subject = "(?i).*?CONTROL.*?TRACTION.*|.*?TRACTION.*?CONTROL.*";

		for (final String text : texts) {
			System.out.println('"' + text + '"'
					+ (text.matches(subject) ? " matches" : " does not match")
					+ " (should "
					+ ((counter++ < 7) ? "not " : "") + "match).");
		}
	}

}

Open in new window

And here is the current inacceptable output:
"Control" does not match (should not match).
"CONTROL" does not match (should not match).
"TRACTION" matches (should not match).
"Traction" matches (should not match).
"TrACTION" matches (should not match).
"out of control" does not match (should not match).
"Ekrem" matches (should match).
"Q" matches (should match).
"Quo vadis?" does not match (should match).
"Quo vadis" does not match (should match).
"Freight vehicle" does not match (should match).
"Locomotive" matches (should match).
"uncontrolled man" does not match (should match).
"attraction" matches (should match).
0
CEHJCommented:
This produced my last output:

import java.util.Vector;

public class M {
    public static void main(String[] args) {
        final Vector<String> texts = new Vector<String>();

        texts.add("Control");
        texts.add("CONTROL");
        texts.add("TRACTION");
        texts.add("Traction");
        texts.add("TrACTION");
        texts.add("Ekrem");
        texts.add("Q");
        texts.add("Quo vadis?");
        texts.add("Quo vadis");
        texts.add("Freight vehicle");
        texts.add("Locomotive");
        texts.add("out of control");
        texts.add("uncontrolled man");
        texts.add("atraction");

        final String RE = "(?i).*?\\b(?:CONTROL|TRACTION)\\b.*";

        for (String s : texts) {
            System.out.printf("Target(s) absent in the string '%s'? %b%n", s,
                !s.matches(RE));
        }
    }
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
CEHJ, congratulations! Your answer is correct. I mean the false and true at the end is correct. I propose to give the full points to you. Any objections of the other participants?
0
käµfm³d 👽Commented:
What should happen with:

out-of-control car

? Because CEHJ's pattern will accept that as a valid match, which the ! in line 26 will turn into a false.
0
CEHJCommented:
What should happen with:

out-of-control car
I can't say, personally, since it's not my requirement, but of course 'control' in that case is not a separate word
0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
The hyphen is a valid word separator. It could return TRUE or FALSE (if ot sees 'out-of-control' as a word).
0
CEHJCommented:
The hyphen is a valid word separator.
No - on the contrary - it joins what appear to be separate words into one
0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
Again, thanks to CEHJ for the only correct answer.
0
CEHJCommented:
:)
0
mccarlIT Business Systems Analyst / Software DeveloperCommented:
for the only correct answer
I really don't give a about the points, but I find it interesting that you say that, when my earlier post gives the same output!
0
CEHJCommented:
I really don't give a about the points, but I find it interesting that you say that, when my earlier post gives the same output!
Yes, mccarl is right about that and shouldn't have been overlooked points-wise. It must be said though that use of the regex package isn't necessary, nor is the Unicode flag necessary (the pattern has no Unicode in it)
0
Ahmet Ekrem SABANSenior IT consultantAuthor Commented:
I asked if there are any objections giving the full points to CEHJ, who gave the right answer. Now, it's top late.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java EE

From novice to tech pro — start learning today.