Link to home
Start Free TrialLog in
Avatar of richardsimnett
richardsimnett

asked on

What would the regex be to see if any of the words in a list exist in a line of text?

Hello,
What would the regex look like to see if any of the following words appeared in a line of text?

the, at, are, is, will

I just need something that will return true if any of those words are present. Im not familiar with regex at all so I was hoping someone could show me how to do it.

Thanks,
Rick
Avatar of ddrudik
ddrudik
Flag of United States of America image

Possibly:
Raw Match Pattern:
\b(?:the|at|are|is|will)\b
 
Java Code Example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "this is the test";
  Pattern re = Pattern.compile("\\b(?:the|at|are|is|will)\\b");
  Matcher m = re.matcher(sourcestring);
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount(); groupIdx++ ){
        System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx) );
      }
    }
  }
}
 
$matches Array:
(
    [0] => Array
        (
            [0] => is
            [1] => the
        )
 
)

Open in new window

Avatar of richardsimnett
richardsimnett

ASKER

ddrudik,
Is it safe to assume that the raw match pattern will work with String.matches() ?

Thanks,
Rick
also is that pattern case insensitive?
No, the raw match pattern is not formatted properly for use in Java code, see the code example for the formatting as it should be (at least how I understand it should be).
ok... what I need to be able to do is this

function boolean lineTest(String line)
{
     return ((line.substring(3) == "CHK") && (line.matches("someregex") == false));
}

I need some regex I can place into the line.matches statement that will return true if any of the words in the regex match, and it needs to be case insensitive matching because line can be upper or lower case, or contain any combination of upper or lower case.
To use .matches I would recommend you try the pattern:
"^.*\\b(?:the|at|are|is|will)\\b.*$"

you could leave the ^ and $ off of the pattern since with .matches it is assumed:
".*\\b(?:the|at|are|is|will)\\b.*"
As for case-insensitive matching you cannot specify that with .matches, you will need to convert "line" to lowercase before sending to .matches.
ASKER CERTIFIED SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks!
Thanks for the question and the points.